The core methods of StatsForecast are:


source

StatsForecast

 StatsForecast (models:List[Any], freq:Union[str,int], n_jobs:int=1, df:Un
                ion[pandas.core.frame.DataFrame,polars.dataframe.frame.Dat
                aFrame,NoneType]=None, sort_df:bool=True,
                fallback_model:Optional[Any]=None, verbose:bool=False)

The StatsForecast class allows you to efficiently fit multiple StatsForecast models for large sets of time series. It operates on a DataFrame df with at least three columns ids, times and targets.

The class has memory-efficient StatsForecast.forecast method that avoids storing partial model outputs. While the StatsForecast.fit and StatsForecast.predict methods with Scikit-learn interface store the fitted models.

The StatsForecast class offers parallelization utilities with Dask, Spark and Ray back-ends. See distributed computing example here.

# StatsForecast's class usage example

#from statsforecast.core import StatsForecast
from statsforecast.models import ( 
    ADIDA,
    AutoARIMA,
    CrostonClassic,
    CrostonOptimized,
    CrostonSBA,
    HistoricAverage,
    IMAPA,
    Naive,
    RandomWalkWithDrift,
    SeasonalExponentialSmoothing,
    SeasonalNaive,
    SeasonalWindowAverage,
    SimpleExponentialSmoothing,
    TSB,
    WindowAverage,
    DynamicOptimizedTheta,
    AutoETS,
    AutoCES
)
# Generate synthetic panel DataFrame for example
panel_df = generate_series(n_series=9, equal_ends=False, engine='pandas')
panel_df.groupby('unique_id').tail(4)
if 'NIXTLA_ID_AS_COL' in os.environ:
    del os.environ['NIXTLA_ID_AS_COL']
os.environ['NIXTLA_ID_AS_COL'] = '1'
# Declare list of instantiated StatsForecast estimators to be fitted
# You can try other estimator's hyperparameters
# You can try other methods from the `models.StatsForecast` collection
# Check them here: https://nixtla.github.io/statsforecast/models.html
models=[AutoARIMA(), Naive(), 
        AutoETS(), AutoARIMA(allowmean=True, alias='MeanAutoARIMA')] 

# Instantiate StatsForecast class
fcst = StatsForecast(models=models,
                     freq='D',
                     n_jobs=1,
                     verbose=True)

# Efficiently predict
fcsts_df = fcst.forecast(df=panel_df, h=4, fitted=True)
fcsts_df.groupby('unique_id').tail(4)

source

StatsForecast.fit

 StatsForecast.fit (df:Union[pandas.core.frame.DataFrame,polars.dataframe.
                    frame.DataFrame,NoneType]=None, sort_df:bool=True, pre
                    diction_intervals:Optional[statsforecast.utils.Conform
                    alIntervals]=None, id_col:str='unique_id',
                    time_col:str='ds', target_col:str='y')

Fit statistical models.

Fit models to a large set of time series from DataFrame df and store fitted models for later inspection.

TypeDefaultDetails
dfUnionNoneDataFrame with ids, times, targets and exogenous.
If None, the StatsForecast class should have been instantiated using df.
sort_dfboolTrueSort df by ids and times.
prediction_intervalsOptionalNoneConfiguration to calibrate prediction intervals (Conformal Prediction).
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
target_colstryColumn that contains the target.
ReturnsStatsForecastReturns with stored StatsForecast fitted models.

source

SatstForecast.predict

 SatstForecast.predict (h:int, X_df:Union[pandas.core.frame.DataFrame,pola
                        rs.dataframe.frame.DataFrame,NoneType]=None,
                        level:Optional[List[int]]=None)

Predict statistical models.

Use stored fitted models to predict large set of time series from DataFrame df.

TypeDefaultDetails
hintForecast horizon.
X_dfUnionNoneDataFrame with ids, times and future exogenous.
levelOptionalNoneConfidence levels between 0 and 100 for prediction intervals.
Returnspandas or polars DataFrameDataFrame with models columns for point predictions and probabilistic
predictions for all fitted models.

source

StatsForecast.fit_predict

 StatsForecast.fit_predict (h:int, df:Union[pandas.core.frame.DataFrame,po
                            lars.dataframe.frame.DataFrame,NoneType]=None,
                            X_df:Union[pandas.core.frame.DataFrame,polars.
                            dataframe.frame.DataFrame,NoneType]=None,
                            level:Optional[List[int]]=None,
                            sort_df:bool=True, prediction_intervals:Option
                            al[statsforecast.utils.ConformalIntervals]=Non
                            e, id_col:str='unique_id', time_col:str='ds',
                            target_col:str='y')

Fit and Predict with statistical models.

This method avoids memory burden due from object storage. It is analogous to Scikit-Learn fit_predict without storing information. It requires the forecast horizon h in advance.

In contrast to StatsForecast.forecast this method stores partial models outputs.

TypeDefaultDetails
hintForecast horizon.
dfUnionNoneDataFrame with ids, times, targets and exogenous.
If None, the StatsForecast class should have been instantiated using df.
X_dfUnionNoneDataFrame with ids, times and future exogenous.
levelOptionalNoneConfidence levels between 0 and 100 for prediction intervals.
sort_dfboolTrueSort df by ids and times.
prediction_intervalsOptionalNoneConfiguration to calibrate prediction intervals (Conformal Prediction).
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
target_colstryColumn that contains the target.
ReturnsUnionDataFrame with models columns for point predictions and probabilistic
predictions for all fitted models.

source

StatsForecast.forecast

 StatsForecast.forecast (h:int, df:Union[pandas.core.frame.DataFrame,polar
                         s.dataframe.frame.DataFrame,NoneType]=None, X_df:
                         Union[pandas.core.frame.DataFrame,polars.datafram
                         e.frame.DataFrame,NoneType]=None,
                         level:Optional[List[int]]=None,
                         fitted:bool=False, sort_df:bool=True, prediction_
                         intervals:Optional[statsforecast.utils.ConformalI
                         ntervals]=None, id_col:str='unique_id',
                         time_col:str='ds', target_col:str='y')

Memory Efficient predictions.

This method avoids memory burden due from object storage. It is analogous to Scikit-Learn fit_predict without storing information. It requires the forecast horizon h in advance.

TypeDefaultDetails
hintForecast horizon.
dfUnionNoneDataFrame with ids, times, targets and exogenous.
X_dfUnionNoneDataFrame with ids, times and future exogenous.
levelOptionalNoneConfidence levels between 0 and 100 for prediction intervals.
fittedboolFalseStore in-sample predictions.
sort_dfboolTrueSort df by ids and times.
prediction_intervalsOptionalNoneConfiguration to calibrate prediction intervals (Conformal Prediction).
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
target_colstryColumn that contains the target.
ReturnsUnionDataFrame with models columns for point predictions and probabilistic
predictions for all fitted models.
# StatsForecast.forecast method usage example

#from statsforecast.core import StatsForecast
from statsforecast.utils import AirPassengersDF as panel_df
from statsforecast.models import AutoARIMA, Naive
# Instantiate StatsForecast class
fcst = StatsForecast(models=[AutoARIMA(), Naive()],
                     freq='D', n_jobs=1)

# Efficiently predict without storing memory
fcsts_df = fcst.forecast(df=panel_df, h=4, fitted=True)
fcsts_df.groupby('unique_id').tail(4)

source

StatsForecast.forecast_fitted_values

 StatsForecast.forecast_fitted_values ()

Access insample predictions.

After executing StatsForecast.forecast, you can access the insample prediction values for each model. To get them, you need to pass fitted=True to the StatsForecast.forecast method and then use the StatsForecast.forecast_fitted_values method.

# StatsForecast.forecast_fitted_values method usage example

#from statsforecast.core import StatsForecast
from statsforecast.utils import AirPassengersDF as panel_df
from statsforecast.models import Naive
# Instantiate StatsForecast class
fcst = StatsForecast(models=[AutoARIMA()], freq='D', n_jobs=1)

# Access insample predictions
fcsts_df = fcst.forecast(df=panel_df, h=12, fitted=True, level=(90, 10))
insample_fcsts_df = fcst.forecast_fitted_values()
insample_fcsts_df.tail(4)

source

StatsForecast.cross_validation

 StatsForecast.cross_validation (h:int, df:Union[pandas.core.frame.DataFra
                                 me,polars.dataframe.frame.DataFrame,NoneT
                                 ype]=None, n_windows:int=1,
                                 step_size:int=1,
                                 test_size:Optional[int]=None,
                                 input_size:Optional[int]=None,
                                 level:Optional[List[int]]=None,
                                 fitted:bool=False,
                                 refit:Union[bool,int]=True,
                                 sort_df:bool=True, prediction_intervals:O
                                 ptional[statsforecast.utils.ConformalInte
                                 rvals]=None, id_col:str='unique_id',
                                 time_col:str='ds', target_col:str='y')

Temporal Cross-Validation.

Efficiently fits a list of StatsForecast models through multiple training windows, in either chained or rolled manner.

StatsForecast.models’ speed allows to overcome this evaluation technique high computational costs. Temporal cross-validation provides better model’s generalization measurements by increasing the test’s length and diversity.

TypeDefaultDetails
hintForecast horizon.
dfUnionNoneDataFrame with ids, times, targets and exogenous.
If None, the StatsForecast class should have been instantiated using df.
n_windowsint1Number of windows used for cross validation.
step_sizeint1Step size between each window.
test_sizeOptionalNoneLength of test size. If passed, set n_windows=None.
input_sizeOptionalNoneInput size for each window, if not none rolled windows.
levelOptionalNoneConfidence levels between 0 and 100 for prediction intervals.
fittedboolFalseStore in-sample predictions.
refitUnionTrueWether or not refit the model for each window.
If int, train the models every refit windows.
sort_dfboolTrueSort df by ids and times.
prediction_intervalsOptionalNoneConfiguration to calibrate prediction intervals (Conformal Prediction).
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
target_colstryColumn that contains the target.
ReturnsUnionDataFrame with insample models columns for point predictions and probabilistic
predictions for all fitted models.
# StatsForecast.crossvalidation method usage example

#from statsforecast.core import StatsForecast
from statsforecast.utils import AirPassengersDF as panel_df
from statsforecast.models import Naive
# Instantiate StatsForecast class
fcst = StatsForecast(models=[Naive()],
                     freq='D', n_jobs=1, verbose=True)

# Access insample predictions
rolled_fcsts_df = fcst.cross_validation(df=panel_df, h=14, n_windows=2)
rolled_fcsts_df.head(4)

source

StatsForecast.cross_validation_fitted_values

 StatsForecast.cross_validation_fitted_values ()

Access insample cross validated predictions.

After executing StatsForecast.cross_validation, you can access the insample prediction values for each model and window. To get them, you need to pass fitted=True to the StatsForecast.cross_validation method and then use the StatsForecast.cross_validation_fitted_values method.

# StatsForecast.cross_validation_fitted_values method usage example

#from statsforecast.core import StatsForecast
from statsforecast.utils import AirPassengersDF as panel_df
from statsforecast.models import Naive
# Instantiate StatsForecast class
fcst = StatsForecast(models=[Naive()],
                     freq='D', n_jobs=1)

# Access insample predictions
rolled_fcsts_df = fcst.cross_validation(df=panel_df, h=12, n_windows=2, fitted=True)
insample_rolled_fcsts_df = fcst.cross_validation_fitted_values()
insample_rolled_fcsts_df.tail(4)

source

StatsForecast.plot

 StatsForecast.plot
                     (df:Union[pandas.core.frame.DataFrame,polars.datafram
                     e.frame.DataFrame], forecasts_df:Union[pandas.core.fr
                     ame.DataFrame,polars.dataframe.frame.DataFrame,NoneTy
                     pe]=None, unique_ids:Union[List[str],NoneType,numpy.n
                     darray]=None, plot_random:bool=True,
                     models:Optional[List[str]]=None,
                     level:Optional[List[float]]=None,
                     max_insample_length:Optional[int]=None,
                     plot_anomalies:bool=False, engine:str='matplotlib',
                     id_col:str='unique_id', time_col:str='ds',
                     target_col:str='y',
                     resampler_kwargs:Optional[Dict]=None)

Plot forecasts and insample values.

TypeDefaultDetails
dfUnionDataFrame with ids, times, targets and exogenous.
forecasts_dfUnionNoneDataFrame ids, times and models.
unique_idsUnionNoneids to plot. If None, they’re selected randomly.
plot_randomboolTrueSelect time series to plot randomly.
modelsOptionalNoneList of models to plot.
levelOptionalNoneList of prediction intervals to plot if paseed.
max_insample_lengthOptionalNoneMax number of train/insample observations to be plotted.
plot_anomaliesboolFalsePlot anomalies for each prediction interval.
enginestrmatplotlibLibrary used to plot. ‘plotly’, ‘plotly-resampler’ or ‘matplotlib’.
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
target_colstryColumn that contains the target.
resampler_kwargsOptionalNoneKwargs to be passed to plotly-resampler constructor.
For further custumization (“show_dash”) call the method,
store the plotting object and add the extra arguments to
its show_dash method.

source

StatsForecast.save

 StatsForecast.save (path:Union[pathlib.Path,str,NoneType]=None,
                     max_size:Optional[str]=None, trim:bool=False)

Function that will save StatsForecast class with certain settings to make it reproducible.

TypeDefaultDetails
pathUnionNonePath of the file to be saved. If None will create one in the current
directory using the current UTC timestamp.
max_sizeOptionalNoneStatsForecast object should not exceed this size.
Available byte naming: [‘B’, ‘KB’, ‘MB’, ‘GB’]
trimboolFalseDelete any attributes not needed for inference.

source

StatsForecast.load

 StatsForecast.load (path:Union[pathlib.Path,str])

Automatically loads the model into ready StatsForecast.

TypeDetails
pathUnionPath to saved StatsForecast file.
Returnssf: StatsForecastPreviously saved StatsForecast
fcst = StatsForecast(
    models=[ADIDA(), SimpleExponentialSmoothing(0.1), 
            HistoricAverage(), CrostonClassic()],
    freq='D',
    n_jobs=1
)
res = fcst.forecast(df=series, h=14)

Misc

Integer datestamp

The StatsForecast class can also receive integers as datestamp, the following example shows how to do it.

# from statsforecast.core import StatsForecast
from statsforecast.utils import AirPassengers as ap
from statsforecast.models import HistoricAverage
int_ds_df = pd.DataFrame({'ds': np.arange(1, len(ap) + 1), 'y': ap})
int_ds_df.insert(0, 'unique_id', 'AirPassengers')
int_ds_df.head()
int_ds_df.tail()
int_ds_df
fcst = StatsForecast(models=[HistoricAverage()], freq=1)
horizon = 7
forecast = fcst.forecast(df=int_ds_df, h=horizon)
forecast.head()
last_date = int_ds_df['ds'].max()
test_eq(forecast['ds'].values, np.arange(last_date + 1, last_date + 1 + horizon))
int_ds_cv = fcst.cross_validation(df=int_ds_df, h=7, test_size=8, n_windows=None)
int_ds_cv

External regressors

Every column after y is considered an external regressor and will be passed to the models that allow them. If you use them you must supply the future values to the StatsForecast.forecast method.

class LinearRegression(_TS):
    
    def __init__(self):
        pass
    
    def fit(self, y, X):
        self.coefs_, *_ = np.linalg.lstsq(X, y, rcond=None)
        return self
    
    def predict(self, h, X):
        mean = X @ coefs
        return mean
    
    def __repr__(self):
        return 'LinearRegression()'
    
    def forecast(self, y, h, X=None, X_future=None, fitted=False):
        coefs, *_ = np.linalg.lstsq(X, y, rcond=None)
        return {'mean': X_future @ coefs}
    
    def new(self):
        b = type(self).__new__(type(self))
        b.__dict__.update(self.__dict__)
        return b
series_xreg = series = generate_series(10_000, equal_ends=True)
series_xreg['intercept'] = 1
series_xreg['dayofweek'] = series_xreg['ds'].dt.dayofweek
series_xreg = pd.get_dummies(series_xreg, columns=['dayofweek'], drop_first=True)
series_xreg
dates = sorted(series_xreg['ds'].unique())
valid_start = dates[-14]
train_mask = series_xreg['ds'] < valid_start
series_train = series_xreg[train_mask]
series_valid = series_xreg[~train_mask]
X_valid = series_valid.drop(columns=['y'])
fcst = StatsForecast(
    models=[LinearRegression()],
    freq='D',
)
xreg_res = fcst.forecast(df=series_train, h=14, X_df=X_valid)
xreg_res['y'] = series_valid['y'].values
xreg_res.drop(columns='unique_id').groupby('ds').mean().plot()
xreg_res_cv = fcst.cross_validation(df=series_train, h=3, test_size=5, n_windows=None)

Prediction intervals

You can pass the argument level to the StatsForecast.forecast method to calculate prediction intervals. Not all models can calculate them at the moment, so we will only obtain the intervals of those models that have it implemented.

ap_df = pd.DataFrame({'ds': np.arange(ap.size), 'y': ap})
ap_df['unique_id'] = 0
sf = StatsForecast(
    models=[
        SeasonalNaive(season_length=12), 
        AutoARIMA(season_length=12)
    ],
    freq=1,
    n_jobs=1
)
ap_ci = sf.forecast(df=ap_df, h=12, level=(80, 95))
fcst.plot(ap_df, ap_ci, level=[80], engine="matplotlib")

Conformal Prediction intervals

You can also add conformal intervals using the following code.

from statsforecast.utils import ConformalIntervals
sf = StatsForecast(
    models=[
        AutoARIMA(season_length=12),
        AutoARIMA(
            season_length=12, 
            prediction_intervals=ConformalIntervals(n_windows=2, h=12),
            alias='ConformalAutoARIMA'
        ),
    ],
    freq=1,
    n_jobs=1
)
ap_ci = sf.forecast(df=ap_df, h=12, level=(80, 95))
fcst.plot(ap_df, ap_ci, level=[80], engine="plotly")

You can also compute conformal intervals for all the models that support them, using the following,

sf = StatsForecast(
    models=[
        AutoARIMA(season_length=12),
    ],
    freq=1,
    n_jobs=1
)
ap_ci = sf.forecast(
    df=ap_df, 
    h=12, 
    level=(50, 80, 95), 
    prediction_intervals=ConformalIntervals(h=12),
)
fcst.plot(ap_df, ap_ci, level=[80], engine="matplotlib")