End to End Walkthrough
Model training, evaluation and selection for multiple time series
Prerequesites
This Guide assumes basic familiarity with NeuralForecast. For a minimal example visit the Quick Start
Follow this article for a step to step guide on building a production-ready forecasting pipeline for multiple time series.
During this guide you will gain familiary with the core
NueralForecast
class and some relevant methods like
NeuralForecast.fit
,
NeuralForecast.predict
,
and StatsForecast.cross_validation.
We will use a classical benchmarking dataset from the M4 competition. The dataset includes time series from different domains like finance, economy and sales. In this example, we will use a subset of the Hourly dataset.
We will model each time series globally Therefore, you will train a set of models for the whole dataset, and then select the best model for each individual time series. NeuralForecast focuses on speed, simplicity, and scalability, which makes it ideal for this task.
Outline:
- Install packages.
- Read the data.
- Explore the data.
- Train many models globally for the entire dataset.
- Evaluate the model’s performance using cross-validation.
- Select the best model for every unique time series.
Not Covered in this guide
- Using external regressors or exogenous variables
- Follow this tutorial to include exogenous variables like weather or holidays or static variables like category or family.
- Probabilistic forecasting
- Follow this tutorial to generate probabilistic forecasts
- Transfer Learning
- Train a model and use it to forecast on different data using this tutorial
Tip
You can use Colab to run this Notebook interactively
Warning
To reduce the computation time, it is recommended to use GPU. Using Colab, do not forget to activate it. Just go to
Runtime>Change runtime type
and select GPU as hardware accelerator.
1. Install libraries
We assume you have
NeuralForecast
already installed. Check this guide for instructions on how to install
NeuralForecast.
Additionally, we will install s3fs
to read from the S3 Filesystem of
AWS, statsforecast
for plotting, and datasetsforecast
for common
error metrics like MAE or MASE.
Install the necessary packages using
pip install statsforecast s3fs datasetsforecast
“
2. Read the data
We will use pandas to read the M4 Hourly data set stored in a parquet
file for efficiency. You can use ordinary pandas operations to read your
data in other formats likes .csv
.
The input to
NeuralForecast
is always a data frame in long
format with
three columns: unique_id
, ds
and y
:
-
The
unique_id
(string, int or category) represents an identifier for the series. -
The
ds
(datestamp or int) column should be either an integer indexing time or a datestampe ideally like YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. -
The
y
(numeric) represents the measurement we wish to forecast. We will rename the
This data set already satisfies the requirement.
Depending on your internet connection, this step should take around 10 seconds.
unique_id | ds | y | |
---|---|---|---|
0 | H1 | 1 | 605.0 |
1 | H1 | 2 | 586.0 |
2 | H1 | 3 | 586.0 |
3 | H1 | 4 | 559.0 |
4 | H1 | 5 | 511.0 |
This dataset contains 414 unique series with 900 observations on average. For this example and reproducibility’s sake, we will select only 10 unique IDs. Depending on your processing infrastructure feel free to select more or less series.
Note
Processing time is dependent on the available computing resources. Running this example with the complete dataset takes around 10 minutes in a c5d.24xlarge (96 cores) instance from AWS.
3. Explore Data with the plot method of StatsForecast
Plot some series using the plot
method from the StatsForecast
class.
This method prints 8 random series from the dataset and is useful for
basic EDA.
Note
The
StatsForecast.plot
method uses Plotly as a defaul engine. You can change to MatPlotLib by settingengine="matplotlib"
.
4. Train multiple models for many series
NeuralForecast
can train many models on many time series globally and efficiently.
Each Auto
model contains a default search space that was extensively
tested on multiple large-scale datasets. Additionally, users can define
specific search spaces tailored for particular datasets and tasks.
First, we create a custom search space for the
AutoNHITS
and
AutoLSTM
models. Search spaces are specified with dictionaries, where keys
corresponds to the model’s hyperparameter and the value is a Tune
function to specify how the hyperparameter will be sampled. For example,
use randint
to sample integers uniformly, and choice
to sample
values of a list.
To instantiate an Auto
model you need to define:
h
: forecasting horizon.loss
: training and validation loss fromneuralforecast.losses.pytorch
.config
: hyperparameter search space. IfNone
, theAuto
class will use a pre-defined suggested hyperparameter space.search_alg
: search algorithm (fromtune.search
), default is random search. Refer to https://docs.ray.io/en/latest/tune/api_docs/suggestion.html for more information on the different search algorithm options.num_samples
: number of configurations explored.
In this example we set horizon h
as 48, use the
MQLoss
distribution loss for training and validation, and use the default
search algorithm.
Tip
The number of samples,
num_samples
, is a crucial parameter! Larger values will usually produce better results as we explore more configurations in the search space, but it will increase training times. Larger search spaces will usually require more samples. As a general rule, we recommend settingnum_samples
higher than 20.
Next, we use the Neuralforecast
class to train the Auto
model. In
this step, Auto
models will automatically perform hyperparameter
tuning training multiple models with different hyperparameters,
producing the forecasts on the validation set, and evaluating them. The
best configuration is selected based on the error on a validation set.
Only the best model is stored and used during inference.
Next, we use the predict
method to forecast the next 48 days using the
optimal hyperparameters.
ds | AutoNHITS | AutoNHITS-lo-90 | AutoNHITS-lo-80 | AutoNHITS-hi-80 | AutoNHITS-hi-90 | AutoLSTM | AutoLSTM-lo-90 | AutoLSTM-lo-80 | AutoLSTM-hi-80 | AutoLSTM-hi-90 | |
---|---|---|---|---|---|---|---|---|---|---|---|
unique_id | |||||||||||
H1 | 749 | 550.545288 | 491.368347 | 484.838226 | 640.832520 | 658.631592 | 581.597534 | 510.460632 | 533.967041 | 660.153076 | 690.976379 |
H1 | 750 | 549.216736 | 491.054932 | 484.474243 | 639.552002 | 657.615967 | 530.324402 | 440.821899 | 472.254272 | 622.214539 | 653.435913 |
H1 | 751 | 528.075989 | 466.917053 | 463.002289 | 621.197205 | 642.255005 | 487.045593 | 383.502045 | 423.310974 | 594.273071 | 627.640320 |
H1 | 752 | 486.842255 | 418.012115 | 419.017242 | 585.653259 | 611.903809 | 457.408081 | 347.901093 | 390.807495 | 569.789062 | 604.200012 |
H1 | 753 | 452.015930 | 371.543884 | 379.539215 | 558.845154 | 590.465942 | 441.641418 | 333.888611 | 374.730621 | 557.401978 | 595.008484 |
The StatsForecast.plot
allows for further customization. For example,
plot the results of the different models and unique ids.
5. Evaluate the model’s performance
In previous steps, we’ve taken our historical data to predict the future. However, to asses its accuracy we would also like to know how the model would have performed in the past. To assess the accuracy and robustness of your models on your data perform Cross-Validation.
With time series data, Cross Validation is done by defining a sliding window across the historical data and predicting the period following it. This form of cross-validation allows us to arrive at a better estimation of our model’s predictive abilities across a wider range of temporal instances while also keeping the data in the training set contiguous as is required by our models.
The following graph depicts such a Cross Validation Strategy:
Tip
Setting
n_windows=1
mirrors a traditional train-test split with our historical data serving as the training set and the last 48 hours serving as the testing set.
The cross_validation
method from the
NeuralForecast
class takes the following arguments.
-
df
: training data frame -
step_size
(int): step size between each window. In other words: how often do you want to run the forecasting processes. -
n_windows
(int): number of windows used for cross validation. In other words: what number of forecasting processes in the past do you want to evaluate.
The cv_df
object is a new data frame that includes the following
columns:
unique_id
: identifies each time seriesds
: datestamp or temporal indexcutoff
: the last datestamp or temporal index for the n_windows. If n_windows=1, then one unique cuttoff value, if n_windows=2 then two unique cutoff values.y
: true value"model"
: columns with the model’s name and fitted value.
unique_id | ds | cutoff | AutoNHITS | AutoNHITS-lo-90 | AutoNHITS-lo-80 | AutoNHITS-hi-80 | AutoNHITS-hi-90 | AutoLSTM | AutoLSTM-lo-90 | AutoLSTM-lo-80 | AutoLSTM-hi-80 | AutoLSTM-hi-90 | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | H1 | 700 | 699 | 646.881714 | 601.402893 | 626.471008 | 672.432617 | 683.847778 | 633.707031 | 365.139832 | 407.289246 | 871.474976 | 925.476196 | 684.0 |
1 | H1 | 701 | 699 | 635.608643 | 595.042908 | 612.889771 | 669.565979 | 679.472900 | 632.455017 | 365.303131 | 406.472992 | 869.484985 | 922.926514 | 619.0 |
2 | H1 | 702 | 699 | 592.663940 | 564.124390 | 566.502319 | 648.286072 | 647.859253 | 633.002502 | 365.147522 | 407.174866 | 868.677979 | 925.269409 | 565.0 |
3 | H1 | 703 | 699 | 543.364563 | 516.760742 | 517.990234 | 603.099182 | 601.462280 | 633.903503 | 364.976746 | 408.498779 | 869.797180 | 925.993164 | 532.0 |
4 | H1 | 704 | 699 | 498.051178 | 461.069489 | 474.206360 | 540.752563 | 555.169739 | 634.015991 | 363.384155 | 408.305298 | 870.154297 | 920.329224 | 495.0 |
Now, let’s evaluate the models’ performance.
Warning
You can also use Mean Average Percentage Error (MAPE), however for granular forecasts, MAPE values are extremely hard to judge and not useful to assess forecasting quality.
Create the data frame with the results of the evaluation of your cross-validation data frame using a Mean Squared Error metric.
metric | unique_id | AutoNHITS | AutoLSTM | best_model | |
---|---|---|---|---|---|
0 | mae | H1 | 38.259457 | 131.158150 | AutoNHITS |
1 | mae | H10 | 14.044900 | 32.972164 | AutoNHITS |
2 | mae | H100 | 254.464978 | 281.836064 | AutoNHITS |
3 | mae | H101 | 257.810841 | 148.341771 | AutoLSTM |
4 | mae | H102 | 176.114826 | 472.413350 | AutoNHITS |
Create a summary table with a model column and the number of series where that model performs best.
metric | model | nr. of unique_ids | |
---|---|---|---|
0 | mae | AutoLSTM | 1 |
1 | mse | AutoLSTM | 1 |
2 | rmse | AutoLSTM | 1 |
3 | mae | AutoNHITS | 9 |
4 | mse | AutoNHITS | 9 |
5 | rmse | AutoNHITS | 9 |
metric | model | nr. of unique_ids | |
---|---|---|---|
1 | mse | AutoLSTM | 1 |
4 | mse | AutoNHITS | 9 |
You can further explore your results by plotting the unique_ids where a specific model wins.
6. Select the best model for every unique series
Define a utility function that takes your forecast’s data frame with the predictions and the evaluation data frame and returns a data frame with the best possible forecast for every unique_id.
Create your production-ready data frame with the best forecast for every unique_id.
model | ds | best_model | best_model-hi-90 | best_model-lo-90 |
---|---|---|---|---|
unique_id | ||||
H1 | 749 | 550.545288 | 658.631592 | 491.368347 |
H1 | 750 | 549.216736 | 657.615967 | 491.054932 |
H1 | 751 | 528.075989 | 642.255005 | 466.917053 |
H1 | 752 | 486.842255 | 611.903809 | 418.012115 |
H1 | 753 | 452.015930 | 590.465942 | 371.543884 |
Plot the results.