StatsForecast provide a comprehensive interface for fitting, predicting, forecasting, and evaluating statistical forecasting models on large sets of time series.
Overview
The main methods include:StatsForecast.fit- Fit statistical modelsStatsForecast.predict- Predict using fitted modelsStatsForecast.forecast- Memory-efficient predictions without storing modelsStatsForecast.cross_validation- Temporal cross-validationStatsForecast.plot- Visualization of forecasts and historical data
StatsForecast Class
StatsForecast
Bases: _StatsForecast
The StatsForecast class allows you to efficiently fit multiple StatsForecast models
for large sets of time series. It operates on a DataFrame df with at least three columns:
ids, times, and targets.
The class has a memory-efficient StatsForecast.forecast method that avoids storing partial
model outputs, while the StatsForecast.fit and StatsForecast.predict methods with the
Scikit-learn interface store the fitted models.
The StatsForecast class offers parallelization utilities with Dask, Spark, and Ray back-ends.
See distributed computing example here.
StatsForecast.fit
predict method. This follows the scikit-learn fit/predict interface.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | Input DataFrame containing time series data. Must have columns for series identifiers, timestamps, and target values. Can optionally include exogenous features. | required |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. If provided, the models will be prepared to generate prediction intervals. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. Values can be timestamps (datetime) or integers. | ‘ds’ |
target_col | str | Name of the column containing the target variable to forecast. | ‘y’ |
| Name | Type | Description |
|---|---|---|
StatsForecast | StatsForecast | Returns self with fitted models stored in the fitted_ attribute. This allows for method chaining. |
StatsForecast.predict
fit method to generate predictions for the
specified forecast horizon. This follows the scikit-learn fit/predict interface.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h | int | Forecast horizon, the number of time steps ahead to predict. | required |
X_df | DataFrame | DataFrame containing future exogenous variables. Required if any models use exogenous features. Must have the same structure as training data and include future values for all time series and forecast horizon. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95] for 80% and 95% intervals). If provided with models configured for prediction intervals, the output will include lower and upper bounds. | None |
| Type | Description |
|---|---|
DataFrame | DataFrame with forecasts for each model. Contains the series identifiers, future timestamps, and one column per model with point predictions. If level is specified, includes additional columns for prediction interval bounds (e.g., ‘model-lo-95’, ‘model-hi-95’). |
StatsForecast.fit_predict
fit and predict methods in a single operation. The fitted models
are stored internally in the fitted_ attribute for later use, making this method
suitable when you need both training and immediate predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h | int | Forecast horizon, the number of time steps ahead to predict. | required |
df | DataFrame | Input DataFrame containing time series data. Must have columns for series identifiers, timestamps, and target values. Can optionally include exogenous features. | required |
X_df | DataFrame | DataFrame containing future exogenous variables. Required if any models use exogenous features. Must include future values for all time series and forecast horizon. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95]). Required if prediction_intervals is specified. | None |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. Values can be timestamps (datetime) or integers. | ‘ds’ |
target_col | str | Name of the column containing the target variable to forecast. | ‘y’ |
| Type | Description |
|---|---|
DataFrame | DataFrame with forecasts containing series identifiers, future timestamps, and predictions from each model. Includes prediction intervals if level is specified. |
StatsForecast.forecast
fit_predict
when you don’t need to inspect or reuse the fitted models. Models are trained and
used for forecasting within each time series, then discarded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h | int | Forecast horizon, the number of time steps ahead to predict. | required |
df | DataFrame | Input DataFrame containing time series data. Must have columns for series identifiers, timestamps, and target values. Can optionally include exogenous features for training. | required |
X_df | DataFrame | DataFrame containing future exogenous variables. Required if any models use exogenous features. Must include future values for all time series and forecast horizon. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95]). | None |
fitted | bool | If True, stores in-sample (fitted) predictions which can be retrieved using forecast_fitted_values(). | False |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. Values can be timestamps (datetime) or integers. | ‘ds’ |
target_col | str | Name of the column containing the target variable to forecast. | ‘y’ |
| Type | Description |
|---|---|
DataFrame | DataFrame with forecasts containing series identifiers, future timestamps, and predictions from each model. Includes prediction intervals if level is specified. |
StatsForecast.cross_validation
| Name | Type | Description | Default |
|---|---|---|---|
h | int | Forecast horizon for each validation window. | required |
df | DataFrame | Input DataFrame containing time series data with columns for series identifiers, timestamps, and target values. | required |
n_windows | int | Number of validation windows to create. Cannot be specified together with test_size. | 1 |
step_size | int | Number of time steps between consecutive validation windows. Smaller values create overlapping windows. | 1 |
test_size | int | Total size of the test period. If provided, n_windows is computed automatically. Overrides n_windows if specified. | None |
input_size | int | Maximum number of training observations to use for each window. If None, uses expanding windows with all available history. If specified, uses rolling windows of fixed size. | None |
level | List[float] | Confidence levels between 0 and 100 for prediction intervals (e.g., [80, 95]). | None |
fitted | bool | If True, stores in-sample predictions for each window, accessible via cross_validation_fitted_values(). | False |
refit | bool or int | Controls model refitting frequency. If True, refits models for every window. If False, fits once and uses the forward method. If an integer n, refits every n windows. Models must implement the forward method when refit is not True. | True |
prediction_intervals | ConformalIntervals | Configuration for calibrating prediction intervals using Conformal Prediction. Requires level to be specified. | None |
id_col | str | Name of the column containing unique identifiers for each time series. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps or time indices. | ‘ds’ |
target_col | str | Name of the column containing the target variable. | ‘y’ |
| Type | Description |
|---|---|
DataFrame | DataFrame with cross-validation results including series identifiers, cutoff dates (last training observation), forecast dates, actual values, and predictions from each model for all windows. |
StatsForecast.plot
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | Input DataFrame containing historical time series data with columns for series identifiers, timestamps, and target values. | required |
forecasts_df | DataFrame | DataFrame with forecast results from forecast() or cross_validation(). Should contain series identifiers, timestamps, and model predictions. | None |
unique_ids | List[str] or ndarray | Specific series identifiers to plot. If None and plot_random is True, series are selected randomly. | None |
plot_random | bool | Whether to randomly select series to plot when unique_ids is not specified. | True |
models | List[str] | Names of specific models to include in the plot. If None, plots all models present in forecasts_df. | None |
level | List[float] | Confidence levels to plot as shaded regions around forecasts (e.g., [80, 95]). Only applicable if prediction intervals are present in forecasts_df. | None |
max_insample_length | int | Maximum number of historical observations to display. Useful for focusing on recent history when series are long. | None |
plot_anomalies | bool | If True, highlights observations that fall outside prediction intervals as anomalies. | False |
engine | str | Plotting library to use. Options are ‘matplotlib’ (static plots), ‘plotly’ (interactive plots), or ‘plotly-resampler’ (interactive with downsampling for large datasets). | ‘matplotlib’ |
id_col | str | Name of the column containing series identifiers. | ‘unique_id’ |
time_col | str | Name of the column containing timestamps. | ‘ds’ |
target_col | str | Name of the column containing the target variable. | ‘y’ |
resampler_kwargs | Dict | Additional keyword arguments passed to the plotly-resampler constructor when engine='plotly-resampler'. For further customization (e.g., ‘show_dash’), call this method, store the returned object, and add arguments to its show_dash method. | None |
| Type | Description |
|---|---|
| Plotting object from the selected engine (matplotlib Figure, plotly Figure, or | |
| FigureResampler object), which can be further customized or displayed. |
StatsForecast.save
load() method
to restore the exact state for making predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path | str or Path | File path where the object will be saved. If None, creates a filename in the current directory using the format ‘StatsForecast_YYYY-MM-DD_HH-MM-SS.pkl’ with the current UTC timestamp. | None |
max_size | str | Maximum allowed size for the serialized object. Should be specified as a number followed by a unit: ‘B’, ‘KB’, ‘MB’, or ‘GB’ (e.g., ‘100MB’, ‘1.5GB’). If the object exceeds this size, an OSError is raised. | None |
trim | bool | If True, removes fitted values from forecast() and cross_validation() before saving to reduce file size. These values are not needed for generating new predictions. | False |
StatsForecast.load
save() method,
restoring all fitted models and configuration. The loaded object is ready to
generate predictions immediately.
Parameters:
Returns:
| Name | Type | Description |
|---|---|---|
StatsForecast | The deserialized StatsForecast instance with all fitted models and configuration restored, ready for prediction. |
Usage Examples
Basic Forecasting
Cross-Validation
Prediction Intervals
Conformal Prediction Intervals
Advanced Features
Integer Datestamps
TheStatsForecast class can work with integer datestamps instead of datetime objects:
External Regressors
Every column aftery is considered an external regressor and will be passed to models that support them:
Distributed Computing
TheStatsForecast class offers parallelization utilities with Dask, Spark and Ray backends for distributed computing. See the distributed computing examples for more information.
