End to End Walkthrough
Model training, evaluation and selection for multiple time series
Prerequesites
This Guide assumes basic familiarity with StatsForecast. For a minimal example visit the Quick Start.
Follow this article for a step to step guide on building a productionready forecasting pipeline for multiple time series.
During this guide you will gain familiary with the core
StatsForecast
class
and some relevant methods like StatsForecast.plot
,
StatsForecast.forecast
and StatsForecast.cross_validation.
We will use a classical benchmarking dataset from the M4 competition. The dataset includes time series from different domains like finance, economy and sales. In this example, we will use a subset of the Hourly dataset.
We will model each time series individually. Forecasting at this level is also known as local forecasting. Therefore, you will train a series of models for every unique series and then select the best one. StatsForecast focuses on speed, simplicity, and scalability, which makes it ideal for this task.
Outline:
 Install packages.
 Read the data.
 Explore the data.
 Train many models for every unique combination of time series.
 Evaluate the model’s performance using crossvalidation.
 Select the best model for every unique time series.
Not Covered in this guide
 Forecasting at scale using clusters on the cloud.
 Forecast the M5 Dataset in 5min using Ray clusters.
 Forecast the M5 Dataset in 5min using Spark clusters.
 Learn how to predict 1M series in less than 30min.
 Training models on Multiple Seasonalities.
 Learn to use multiple seasonality in this Electricity Load forecasting tutorial.
 Using external regressors or exogenous variables
 Follow this tutorial to include exogenous variables like weather or holidays or static variables like category or family.
 Comparing StatsForecast with other popular libraries.
 You can reproduce our benchmarks here.
Install libraries
We assume you have StatsForecast already installed. Check this guide for instructions on how to install StatsForecast.
Read the data
We will use pandas to read the M4 Hourly data set stored in a parquet
file for efficiency. You can use ordinary pandas operations to read your
data in other formats likes .csv
.
The input to StatsForecast is always a data frame in long
format with
three columns: unique_id
, ds
and y
:

The
unique_id
(string, int or category) represents an identifier for the series. 
The
ds
(datestamp or int) column should be either an integer indexing time or a datestampe ideally like YYYYMMDD for a date or YYYYMMDD HH:MM:SS for a timestamp. 
The
y
(numeric) represents the measurement we wish to forecast. The target column needs to be renamed toy
if it has a different column name.
This data set already satisfies the requirements.
Depending on your internet connection, this step should take around 10 seconds.
unique_id  ds  y  

0  H1  1  605.0 
1  H1  2  586.0 
2  H1  3  586.0 
3  H1  4  559.0 
4  H1  5  511.0 
This dataset contains 414 unique series with 900 observations on average. For this example and reproducibility’s sake, we will select only 10 unique IDs and keep only the last week. Depending on your processing infrastructure feel free to select more or less series.
Note
Processing time is dependent on the available computing resources. Running this example with the complete dataset takes around 10 minutes in a c5d.24xlarge (96 cores) instance from AWS.
Explore Data with the plot method
Plot some series using the plot
method from the
StatsForecast
class. This method prints 8 random series from the dataset and is useful
for basic EDA.
Note
The
StatsForecast.plot
method uses Plotly as a defaul engine. You can change to MatPlotLib by settingengine="matplotlib"
.
Train multiple models for many series
StatsForecast can train many models on many time series efficiently.
Start by importing and instantiating the desired models. StatsForecast offers a wide variety of models grouped in the following categories:

Auto Forecast: Automatic forecasting tools search for the best parameters and select the best possible model for a series of time series. These tools are useful for large collections of univariate time series. Includes automatic versions of: Arima, ETS, Theta, CES.

Exponential Smoothing: Uses a weighted average of all past observations where the weights decrease exponentially into the past. Suitable for data with no clear trend or seasonality. Examples: SES, Holt’s Winters, SSO.

Benchmark models: classical models for establishing baselines. Examples: Mean, Naive, Random Walk

Intermittent or Sparse models: suited for series with very few nonzero observations. Examples: CROSTON, ADIDA, IMAPA

Multiple Seasonalities: suited for signals with more than one clear seasonality. Useful for lowfrequency data like electricity and logs. Examples: MSTL.

Theta Models: fit two theta lines to a deseasonalized time series, using different techniques to obtain and combine the two theta lines to produce the final forecasts. Examples: Theta, DynamicTheta
Here you can check the complete list of models .
For this example we will use:

AutoARIMA
: Automatically selects the best ARIMA (AutoRegressive Integrated Moving Average) model using an information criterion. Ref:AutoARIMA
. 
HoltWinters
: triple exponential smoothing, HoltWinters’ method is an extension of exponential smoothing for series that contain both trend and seasonality. Ref:HoltWinters

SeasonalNaive
: Memory Efficient Seasonal Naive predictions. Ref:SeasonalNaive

HistoricAverage
: arthimetic mean. Ref:HistoricAverage
. 
DynamicOptimizedTheta
: The theta family of models has been shown to perform well in various datasets such as M3. Models the deseasonalized time series. Ref:DynamicOptimizedTheta
.
Import and instantiate the models. Setting the season_length
argument
is sometimes tricky. This article on Seasonal
periods) by the
master, Rob Hyndmann, can be useful.
We fit the models by instantiating a new
StatsForecast
object with the following parameters:

models
: a list of models. Select the models you want from models and import them. 
freq
: a string indicating the frequency of the data. (See pandas available frequencies.) 
n_jobs
: n_jobs: int, number of jobs used in the parallel processing, use 1 for all cores. 
fallback_model
: a model to be used if a model fails.
Any settings are passed into the constructor. Then you call its fit method and pass in the historical data frame.
Note
StatsForecast achieves its blazing speed using JIT compiling through Numba. The first time you call the statsforecast class, the fit method should take around 5 seconds. The second time once Numba compiled your settings it should take less than 0.2s.
The forecast
method takes two arguments: forecasts next h
(horizon)
and level
.

h
(int): represents the forecast h steps into the future. In this case, 12 months ahead. 
level
(list of floats): this optional parameter is used for probabilistic forecasting. Set thelevel
(or confidence percentile) of your prediction interval. For example,level=[90]
means that the model expects the real value to be inside that interval 90% of the times.
The forecast object here is a new data frame that includes a column with the name of the model and the y hat values, as well as columns for the uncertainty intervals. Depending on your computer, this step should take around 1min. (If you want to speed things up to a couple of seconds, remove the AutoModels like ARIMA and Theta)
Note
The
forecast
method is compatible with distributed clusters, so it does not store any model parameters. If you want to store parameters for every model you can use thefit
andpredict
methods. However, those methods are not defined for distrubed engines like Spark, Ray or Dask.
unique_id  ds  HoltWinters  HoltWinterslo90  HoltWintershi90  CrostonClassic  CrostonClassiclo90  CrostonClassichi90  SeasonalNaive  SeasonalNaivelo90  SeasonalNaivehi90  HistoricAverage  HistoricAveragelo90  HistoricAveragehi90  DynamicOptimizedTheta  DynamicOptimizedThetalo90  DynamicOptimizedThetahi90  

0  H1  749  829.0  246.367554  1904.367554  829.0  246.367554  1904.367554  635.0  537.471191  732.528809  660.982117  398.03775  923.926514  592.701843  577.677307  611.652649 
1  H1  750  807.0  268.367554  1882.367554  807.0  268.367554  1882.367554  572.0  474.471222  669.528809  660.982117  398.03775  923.926514  525.589111  505.449738  546.621826 
2  H1  751  785.0  290.367554  1860.367554  785.0  290.367554  1860.367554  532.0  434.471222  629.528809  660.982117  398.03775  923.926514  489.251801  462.072876  512.424133 
3  H1  752  756.0  319.367554  1831.367554  756.0  319.367554  1831.367554  493.0  395.471222  590.528809  660.982117  398.03775  923.926514  456.195038  430.554291  478.260956 
4  H1  753  719.0  356.367554  1794.367554  719.0  356.367554  1794.367554  477.0  379.471222  574.528809  660.982117  398.03775  923.926514  436.290527  411.051239  461.815948 
Plot the results of 8 random series using the StatsForecast.plot
method.
The StatsForecast.plot
allows for further customization. For example,
plot the results of the different models and unique ids.
Evaluate the model’s performance
In previous steps, we’ve taken our historical data to predict the future. However, to asses its accuracy we would also like to know how the model would have performed in the past. To assess the accuracy and robustness of your models on your data perform CrossValidation.
With time series data, Cross Validation is done by defining a sliding window across the historical data and predicting the period following it. This form of crossvalidation allows us to arrive at a better estimation of our model’s predictive abilities across a wider range of temporal instances while also keeping the data in the training set contiguous as is required by our models.
The following graph depicts such a Cross Validation Strategy:
Crossvalidation of time series models is considered a best practice but most implementations are very slow. The statsforecast library implements crossvalidation as a distributed operation, making the process less timeconsuming to perform. If you have big datasets you can also perform Cross Validation in a distributed cluster using Ray, Dask or Spark.
In this case, we want to evaluate the performance of each model for the last 2 days (n_windows=2), forecasting every second day (step_size=48). Depending on your computer, this step should take around 1 min.
Tip
Setting
n_windows=1
mirrors a traditional traintest split with our historical data serving as the training set and the last 48 hours serving as the testing set.
The
cross_validation
method from the
StatsForecast
class takes the following arguments.

df
: training data frame 
h
(int): represents h steps into the future that are being forecasted. In this case, 24 hours ahead. 
step_size
(int): step size between each window. In other words: how often do you want to run the forecasting processes. 
n_windows
(int): number of windows used for cross validation. In other words: what number of forecasting processes in the past do you want to evaluate.
The crossvaldation_df object is a new data frame that includes the following columns:

unique_id
: series identifier 
ds
: datestamp or temporal index 
cutoff
: the last datestamp or temporal index for then_windows.
Ifn_windows=1
, then one unique cuttoff value, ifn_windows=2
then two unique cutoff values. 
y
: true value 
"model"
: columns with the model’s name and fitted value.
unique_id  ds  cutoff  y  HoltWinters  CrostonClassic  SeasonalNaive  HistoricAverage  DynamicOptimizedTheta  

0  H1  701  700  619.0  847.0  742.668762  691.0  661.674988  612.767517 
1  H1  702  700  565.0  820.0  742.668762  618.0  661.674988  536.846252 
2  H1  703  700  532.0  790.0  742.668762  563.0  661.674988  497.824280 
3  H1  704  700  495.0  784.0  742.668762  529.0  661.674988  464.723236 
4  H1  705  700  481.0  752.0  742.668762  504.0  661.674988  440.972351 
Next, we will evaluate the performance of every model for every series using common error metrics like Mean Absolute Error (MAE) or Mean Square Error (MSE) Define a utility function to evaluate different error metrics for the cross validation data frame.
First import the desired error metrics from mlforecast.losses
. Then
define a utility function that takes a crossvalidation data frame as a
metric and returns an evaluation data frame with the average of the
error metric for every unique id and fitted model and all cutoffs.
Warning
You can also use Mean Average Percentage Error (MAPE), however for granular forecasts, MAPE values are extremely hard to judge and not useful to assess forecasting quality.
Create the data frame with the results of the evaluation of your crossvalidation data frame using a Mean Squared Error metric.
HoltWinters  CrostonClassic  SeasonalNaive  HistoricAverage  DynamicOptimizedTheta  best_model  

unique_id  
H1  44888.019531  28038.736328  1422.666748  20927.664062  1296.333984  DynamicOptimizedTheta 
H10  2812.916504  1483.484131  96.895828  1980.367432  379.621124  SeasonalNaive 
H100  121625.375000  91945.140625  12019.000000  78491.187500  21699.648438  SeasonalNaive 
H101  28453.394531  16183.634766  10944.458008  18208.404297  63698.082031  SeasonalNaive 
H102  232924.843750  132655.296875  12699.896484  309110.468750  31393.519531  SeasonalNaive 
Create a summary table with a model column and the number of series where that model performs best. In this case, the Arima and Seasonal Naive are the best models for 10 series and the Theta model should be used for two.
You can further explore your results by plotting the unique_ids where a specific model wins.
Select the best model for every unique series
Define a utility function that takes your forecast’s data frame with the predictions and the evaluation data frame and returns a data frame with the best possible forecast for every unique_id.
Create your productionready data frame with the best forecast for every unique_id.
unique_id  ds  best_model  best_modelhi90  best_modello90  

0  H1  749  592.701843  611.652649  577.677307 
1  H1  750  525.589111  546.621826  505.449738 
2  H1  751  489.251801  512.424133  462.072876 
3  H1  752  456.195038  478.260956  430.554291 
4  H1  753  436.290527  461.815948  411.051239 
Plot the results.