Statistical, Machine Learning, and Neural Forecasting Methods In this tutorial, we will explore the process of forecasting on the M5 dataset by utilizing the most suitable model for each time series. We’ll accomplish this through an essential technique known as cross-validation. This approach helps us in estimating the predictive performance of our models, and in selecting the model that yields the best performance for each time series.

The M5 dataset comprises of hierarchical sales data, spanning five years, from Walmart. The aim is to forecast daily sales for the next 28 days. The dataset is broken down into the 50 states of America, with 10 stores in each state.

In the realm of time series forecasting and analysis, one of the more complex tasks is identifying the model that is optimally suited for a specific group of series. Quite often, this selection process leans heavily on intuition, which may not necessarily align with the empirical reality of our dataset.

In this tutorial, we aim to provide a more structured, data-driven approach to model selection for different groups of series within the M5 benchmark dataset. This dataset, well-known in the field of forecasting, allows us to showcase the versatility and power of our methodology.

We will train an assortment of models from various forecasting paradigms:

StatsForecast

  • Baseline models: These models are simple yet often highly effective for providing an initial perspective on the forecasting problem. We will use SeasonalNaive and HistoricAverage models for this category.
  • Intermittent models: For series with sporadic, non-continuous demand, we will utilize models like CrostonOptimized, IMAPA, and ADIDA. These models are particularly suited for handling zero-inflated series.
  • State Space Models: These are statistical models that use mathematical descriptions of a system to make predictions. The AutoETS model from the statsforecast library falls under this category.

MLForecast

Machine Learning: Leveraging ML models like LightGBM, XGBoost, and LinearRegression can be advantageous due to their capacity to uncover intricate patterns in data. We’ll use the MLForecast library for this purpose.

NeuralForecast

Deep Learning: DL models, such as Transformers (AutoTFT) and Neural Networks (AutoNHITS), allow us to handle complex non-linear dependencies in time series data. We’ll utilize the NeuralForecast library for these models.

Using the Nixtla suite of libraries, we’ll be able to drive our model selection process with data, ensuring we utilize the most suitable models for specific groups of series in our dataset.

Outline:

  • Reading Data: In this initial step, we load our dataset into memory, making it available for our subsequent analysis and forecasting. It is important to understand the structure and nuances of the dataset at this stage.

  • Forecasting Using Statistical and Deep Learning Methods: We apply a wide range of forecasting methods from basic statistical techniques to advanced deep learning models. The aim is to generate predictions for the next 28 days based on our dataset.

  • Model Performance Evaluation on Different Windows: We assess the performance of our models on distinct windows.

  • Selecting the Best Model for a Group of Series: Using the performance evaluation, we identify the optimal model for each group of series. This step ensures that the chosen model is tailored to the unique characteristics of each group.

  • Filtering the Best Possible Forecast: Finally, we filter the forecasts generated by our chosen models to obtain the most promising predictions. This is our final output and represents the best possible forecast for each series according to our models.

Warning

This tutorial was originally executed using a c5d.24xlarge EC2 instance.

Installing Libraries

!pip install statsforecast mlforecast neuralforecast pyarrow

Download and prepare data

The example uses the M5 dataset. It consists of 30,490 bottom time series.

import pandas as pd
# Load the training target dataset from the provided URL
Y_df = pd.read_parquet('https://m5-benchmarks.s3.amazonaws.com/data/train/target.parquet')

# Rename columns to match the Nixtlaverse's expectations
# The 'item_id' becomes 'unique_id' representing the unique identifier of the time series
# The 'timestamp' becomes 'ds' representing the time stamp of the data points
# The 'demand' becomes 'y' representing the target variable we want to forecast
Y_df = Y_df.rename(
    columns={
        'item_id': 'unique_id', 
        'timestamp': 'ds', 
        'demand': 'y'
    }
)

# Convert the 'ds' column to datetime format to ensure proper handling of date-related operations in subsequent steps
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

For simplicity sake we will keep just one category

Y_df = Y_df.query('unique_id.str.startswith("FOODS_3")').reset_index(drop=True)
Y_df['unique_id'] = Y_df['unique_id'].astype(str)

Basic Plotting

Plot some series using the plot_series function from the utilsforecast library. This method prints 8 random series from the dataset and is useful for basic EDA.

from utilsforecast.plotting import plot_series
# Feature: plot random series for EDA
plot_series(Y_df)

# Feature: plot groups of series for EDA
plot_series(Y_df, ids=["FOODS_3_432_TX_2"])

Create forecasts with Stats, Ml and Neural methods.

StatsForecast

StatsForecast is a comprehensive library providing a suite of popular univariate time series forecasting models, all designed with a focus on high performance and scalability.

Here’s what makes StatsForecast a powerful tool for time series forecasting:

  • Collection of Local Models: StatsForecast provides a diverse collection of local models that can be applied to each time series individually, allowing us to capture unique patterns within each series.

  • Simplicity: With StatsForecast, training, forecasting, and backtesting multiple models become a straightforward process, requiring only a few lines of code. This simplicity makes it a convenient tool for both beginners and experienced practitioners.

  • Optimized for Speed: The implementation of the models in StatsForecast is optimized for speed, ensuring that large-scale computations are performed efficiently, thereby reducing the overall time for model training and prediction.

  • Horizontal Scalability: One of the distinguishing features of StatsForecast is its ability to scale horizontally. It is compatible with distributed computing frameworks such as Spark, Dask, and Ray. This feature allows it to handle large datasets by distributing the computations across multiple nodes in a cluster, making it a go-to solution for large-scale time series forecasting tasks.

StatsForecast receives a list of models to fit each time series. Since we are dealing with Daily data, it would be benefitial to use 7 as seasonality.

from statsforecast import StatsForecast
# Import necessary models from the statsforecast library
from statsforecast.models import (
    # SeasonalNaive: A model that uses the previous season's data as the forecast
    SeasonalNaive,
    # Naive: A simple model that uses the last observed value as the forecast
    Naive,
    # HistoricAverage: This model uses the average of all historical data as the forecast
    HistoricAverage,
    # CrostonOptimized: A model specifically designed for intermittent demand forecasting
    CrostonOptimized,
    # ADIDA: Adaptive combination of Intermittent Demand Approaches, a model designed for intermittent demand
    ADIDA,
    # IMAPA: Intermittent Multiplicative AutoRegressive Average, a model for intermittent series that incorporates autocorrelation
    IMAPA,
    # AutoETS: Automated Exponential Smoothing model that automatically selects the best Exponential Smoothing model based on AIC
    AutoETS
)

We fit the models by instantiating a new StatsForecast object with the following parameters:

  • models: a list of models. Select the models you want from models and import them.
  • freq: a string indicating the frequency of the data. (See panda’s available frequencies.)
  • n_jobs: int, number of jobs used in the parallel processing, use -1 for all cores.
  • fallback_model: a model to be used if a model fails. Any settings are passed into the constructor. Then you call its fit method and pass in the historical data frame.
horizon = 28
models = [
    SeasonalNaive(season_length=7),
    Naive(),
    HistoricAverage(),
    CrostonOptimized(),
    ADIDA(),
    IMAPA(),
    AutoETS(season_length=7)
]
# Instantiate the StatsForecast class
sf = StatsForecast(
    models=models,  # A list of models to be used for forecasting
    freq='D',  # The frequency of the time series data (in this case, 'D' stands for daily frequency)
    n_jobs=-1,  # The number of CPU cores to use for parallel execution (-1 means use all available cores)
    verbose=True,  # Show progress
)

The forecast method produces predictions for the next h periods.

The forecast object here is a new data frame that includes a column with the name of the model and the y hat values.

This block of code times how long it takes to run the forecasting function of the StatsForecast class, which predicts the next 28 days (h=28). The time is calculated in minutes and printed out at the end.

from time import time

# Get the current time before forecasting starts, this will be used to measure the execution time
init = time()

# Call the forecast method of the StatsForecast instance to predict the next 28 days (h=28) 
fcst_df = sf.forecast(df=Y_df, h=28)

# Get the current time after the forecasting ends
end = time()

# Calculate and print the total time taken for the forecasting in minutes
print(f'Forecast Minutes: {(end - init) / 60}')
Forecast:   0%|          | 0/1600 [Elapsed: 00:00]
Forecast Minutes: 6.027772482236227
fcst_df.head()
unique_iddsSeasonalNaiveNaiveHistoricAverageCrostonOptimizedADIDAIMAPAAutoETS
0FOODS_3_001_CA_12016-05-231.02.00.4487380.3451920.3454770.3472490.381414
1FOODS_3_001_CA_12016-05-240.02.00.4487380.3451920.3454770.3472490.286933
2FOODS_3_001_CA_12016-05-250.02.00.4487380.3451920.3454770.3472490.334987
3FOODS_3_001_CA_12016-05-261.02.00.4487380.3451920.3454770.3472490.186851
4FOODS_3_001_CA_12016-05-270.02.00.4487380.3451920.3454770.3472490.308112

MLForecast

MLForecast is a powerful library that provides automated feature creation for time series forecasting, facilitating the use of global machine learning models. It is designed for high performance and scalability.

Key features of MLForecast include:

  • Support for sklearn models: MLForecast is compatible with models that follow the scikit-learn API. This makes it highly flexible and allows it to seamlessly integrate with a wide variety of machine learning algorithms.

  • Simplicity: With MLForecast, the tasks of training, forecasting, and backtesting models can be accomplished in just a few lines of code. This streamlined simplicity makes it user-friendly for practitioners at all levels of expertise.

  • Optimized for speed: MLForecast is engineered to execute tasks rapidly, which is crucial when handling large datasets and complex models.

  • Horizontal Scalability: MLForecast is capable of horizontal scaling using distributed computing frameworks such as Spark, Dask, and Ray. This feature enables it to efficiently process massive datasets by distributing the computations across multiple nodes in a cluster, making it ideal for large-scale time series forecasting tasks.

from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean
from mlforecast.target_transforms import Differences
from mlforecast.utils import PredictionIntervals
!pip install lightgbm xgboost
# Import the necessary models from various libraries

# LGBMRegressor: A gradient boosting framework that uses tree-based learning algorithms from the LightGBM library
from lightgbm import LGBMRegressor

# XGBRegressor: A gradient boosting regressor model from the XGBoost library
from xgboost import XGBRegressor

# LinearRegression: A simple linear regression model from the scikit-learn library
from sklearn.linear_model import LinearRegression

To use MLForecast for time series forecasting, we instantiate a new MLForecast object and provide it with various parameters to tailor the modeling process to our specific needs:

  • models: This parameter accepts a list of machine learning models you wish to use for forecasting. You can import your preferred models from scikit-learn, lightgbm and xgboost.

  • freq: This is a string indicating the frequency of your data (hourly, daily, weekly, etc.). The specific format of this string should align with pandas’ recognized frequency strings.

  • target_transforms: These are transformations applied to the target variable before model training and after model prediction. This can be useful when working with data that may benefit from transformations, such as log-transforms for highly skewed data.

  • lags: This parameter accepts specific lag values to be used as regressors. Lags represent how many steps back in time you want to look when creating features for your model. For example, if you want to use the previous day’s data as a feature for predicting today’s value, you would specify a lag of 1.

  • lags_transforms: These are specific transformations for each lag. This allows you to apply transformations to your lagged features.

  • date_features: This parameter specifies date-related features to be used as regressors. For instance, you might want to include the day of the week or the month as a feature in your model.

  • num_threads: This parameter controls the number of threads to use for parallelizing feature creation, helping to speed up this process when working with large datasets.

All these settings are passed to the MLForecast constructor. Once the MLForecast object is initialized with these settings, we call its fit method and pass the historical data frame as the argument. The fit method trains the models on the provided historical data, readying them for future forecasting tasks.

# Instantiate the MLForecast object
mlf = MLForecast(
    models=[LGBMRegressor(verbosity=-1), XGBRegressor(), LinearRegression()],  # List of models for forecasting: LightGBM, XGBoost and Linear Regression
    freq='D',  # Frequency of the data - 'D' for daily frequency
    lags=list(range(1, 7)),  # Specific lags to use as regressors: 1 to 6 days
    lag_transforms = {
        1: [ExpandingMean()],  # Apply expanding mean transformation to the lag of 1 day
    },
    date_features=['year', 'month', 'day', 'dayofweek', 'quarter', 'week'],  # Date features to use as regressors
)

Just call the fit models to train the select models. In this case we are generating conformal prediction intervals.

# Start the timer to calculate the time taken for fitting the models
init = time()

# Fit the MLForecast models to the data
mlf.fit(Y_df)

# Calculate the end time after fitting the models
end = time()

# Print the time taken to fit the MLForecast models, in minutes
print(f'MLForecast Minutes: {(end - init) / 60}')
MLForecast Minutes: 0.5530486385027568

After that, just call predict to generate forecasts.

fcst_mlf_df = mlf.predict(28)
fcst_mlf_df.head()
unique_iddsLGBMRegressorXGBRegressorLinearRegression
0FOODS_3_001_CA_12016-05-230.5495200.5601230.359638
1FOODS_3_001_CA_12016-05-240.5531960.3693370.100359
2FOODS_3_001_CA_12016-05-250.5996680.3743380.175837
3FOODS_3_001_CA_12016-05-260.6380970.3271760.156456
4FOODS_3_001_CA_12016-05-270.7633050.3316310.328192

NeuralForecast

NeuralForecast is a robust collection of neural forecasting models that focuses on usability and performance. It includes a variety of model architectures, from classic networks such as Multilayer Perceptrons (MLP) and Recurrent Neural Networks (RNN) to novel contributions like N-BEATS, N-HITS, Temporal Fusion Transformers (TFT), and more.

Key features of NeuralForecast include:

  • A broad collection of global models. Out of the box implementation of MLP, LSTM, RNN, TCN, DilatedRNN, NBEATS, NHITS, ESRNN, TFT, Informer, PatchTST and HINT.
  • A simple and intuitive interface that allows training, forecasting, and backtesting of various models in a few lines of code.
  • Support for GPU acceleration to improve computational speed.

This machine doesn’t have GPU, but Google Colabs offers some for free.

Using Colab’s GPU to train NeuralForecast.

# Read the results from Colab
fcst_nf_df = pd.read_parquet('https://m5-benchmarks.s3.amazonaws.com/data/forecast-nf.parquet')
fcst_nf_df.head()
unique_iddsAutoNHITSAutoNHITS-lo-90AutoNHITS-hi-90AutoTFTAutoTFT-lo-90AutoTFT-hi-90
0FOODS_3_001_CA_12016-05-230.00.02.00.00.02.0
1FOODS_3_001_CA_12016-05-240.00.02.00.00.02.0
2FOODS_3_001_CA_12016-05-250.00.02.00.00.01.0
3FOODS_3_001_CA_12016-05-260.00.02.00.00.02.0
4FOODS_3_001_CA_12016-05-270.00.02.00.00.02.0
# Merge the forecasts from StatsForecast and NeuralForecast
fcst_df = fcst_df.merge(fcst_nf_df, how='left', on=['unique_id', 'ds'])

# Merge the forecasts from MLForecast into the combined forecast dataframe
fcst_df = fcst_df.merge(fcst_mlf_df, how='left', on=['unique_id', 'ds'])
fcst_df.head()
unique_iddsSeasonalNaiveNaiveHistoricAverageCrostonOptimizedADIDAIMAPAAutoETSAutoNHITSAutoNHITS-lo-90AutoNHITS-hi-90AutoTFTAutoTFT-lo-90AutoTFT-hi-90LGBMRegressorXGBRegressorLinearRegression
0FOODS_3_001_CA_12016-05-231.02.00.4487380.3451920.3454770.3472490.3814140.00.02.00.00.02.00.5495200.5601230.359638
1FOODS_3_001_CA_12016-05-240.02.00.4487380.3451920.3454770.3472490.2869330.00.02.00.00.02.00.5531960.3693370.100359
2FOODS_3_001_CA_12016-05-250.02.00.4487380.3451920.3454770.3472490.3349870.00.02.00.00.01.00.5996680.3743380.175837
3FOODS_3_001_CA_12016-05-261.02.00.4487380.3451920.3454770.3472490.1868510.00.02.00.00.02.00.6380970.3271760.156456
4FOODS_3_001_CA_12016-05-270.02.00.4487380.3451920.3454770.3472490.3081120.00.02.00.00.02.00.7633050.3316310.328192

Forecast plots

plot_series(Y_df, fcst_df, max_insample_length=28 * 3)

Use the plot function to explore models and ID’s

plot_series(
    Y_df,
    fcst_df,
    max_insample_length=28 * 3, 
    models=['CrostonOptimized', 'AutoNHITS', 'SeasonalNaive', 'LGBMRegressor'],
)

Validate Model’s Performance

The three libraries - StatsForecast, MLForecast, and NeuralForecast - offer out-of-the-box cross-validation capabilities specifically designed for time series. This allows us to evaluate the model’s performance using historical data to obtain an unbiased assessment of how well each model is likely to perform on unseen data.

Cross Validation in StatsForecast

The cross_validation method from the StatsForecast class accepts the following arguments:

  • df: A DataFrame representing the training data.
  • h (int): The forecast horizon, represented as the number of steps into the future that we wish to predict. For example, if we’re forecasting hourly data, h=24 would represent a 24-hour forecast.
  • step_size (int): The step size between each cross-validation window. This parameter determines how often we want to run the forecasting process.
  • n_windows (int): The number of windows used for cross validation. This parameter defines how many past forecasting processes we want to evaluate.

These parameters allow us to control the extent and granularity of our cross-validation process. By tuning these settings, we can balance between computational cost and the thoroughness of the cross-validation.

sf.verbose = False
init = time()
cv_df = sf.cross_validation(df=Y_df, h=horizon, n_windows=3, step_size=horizon)
end = time()
print(f'CV Minutes: {(end - init) / 60}')
CV Minutes: 17.494867050647734

The crossvaldation_df object is a new data frame that includes the following columns:

  • unique_id series identifier
  • ds: datestamp or temporal index
  • cutoff: the last datestamp or temporal index for the n_windows. If n_windows=1, then one unique cuttoff value, if n_windows=2 then two unique cutoff values.
  • y: true value
  • "model": columns with the model’s name and fitted value.
cv_df.head()
unique_iddscutoffySeasonalNaiveNaiveHistoricAverageCrostonOptimizedADIDAIMAPAAutoETS
0FOODS_3_001_CA_12016-02-292016-02-280.02.00.00.4491110.6184720.6183750.6179980.655286
1FOODS_3_001_CA_12016-03-012016-02-281.00.00.00.4491110.6184720.6183750.6179980.568595
2FOODS_3_001_CA_12016-03-022016-02-281.00.00.00.4491110.6184720.6183750.6179980.618805
3FOODS_3_001_CA_12016-03-032016-02-280.01.00.00.4491110.6184720.6183750.6179980.455891
4FOODS_3_001_CA_12016-03-042016-02-280.01.00.00.4491110.6184720.6183750.6179980.591197

MLForecast

The cross_validation method from the MLForecast class takes the following arguments.

  • df: training data frame
  • h (int): represents the steps into the future that are being forecasted. In this case, 24 hours ahead.
  • step_size (int): step size between each window. In other words: how often do you want to run the forecasting processes.
  • n_windows (int): number of windows used for cross-validation. In other words: what number of forecasting processes in the past do you want to evaluate.
init = time()
cv_mlf_df = mlf.cross_validation(
    df=Y_df, 
    h=horizon,
    n_windows=3,
)
end = time()
print(f'CV Minutes: {(end - init) / 60}')
CV Minutes: 1.785299829641978

The crossvaldation_df object is a new data frame that includes the following columns:

  • unique_id series identifier
  • ds: datestamp or temporal index
  • cutoff: the last datestamp or temporal index for the n_windows. If n_windows=1, then one unique cuttoff value, if n_windows=2 then two unique cutoff values.
  • y: true value
  • "model": columns with the model’s name and fitted value.
cv_mlf_df.head()
unique_iddscutoffyLGBMRegressorXGBRegressorLinearRegression
0FOODS_3_001_CA_12016-02-292016-02-280.00.4356740.556261-0.312492
1FOODS_3_001_CA_12016-03-012016-02-281.00.6396760.625807-0.041920
2FOODS_3_001_CA_12016-03-022016-02-281.00.7929890.6596510.263699
3FOODS_3_001_CA_12016-03-032016-02-280.00.8068680.5351210.482491
4FOODS_3_001_CA_12016-03-042016-02-280.00.8291060.3133540.677326

NeuralForecast

This machine doesn’t have GPU, but Google Colabs offers some for free.

Using Colab’s GPU to train NeuralForecast.

cv_nf_df = pd.read_parquet('https://m5-benchmarks.s3.amazonaws.com/data/cross-validation-nf.parquet')
cv_nf_df.head()
unique_iddscutoffAutoNHITSAutoNHITS-lo-90AutoNHITS-hi-90AutoTFTAutoTFT-lo-90AutoTFT-hi-90y
0FOODS_3_001_CA_12016-02-292016-02-280.00.02.01.00.02.00.0
1FOODS_3_001_CA_12016-03-012016-02-280.00.02.01.00.02.01.0
2FOODS_3_001_CA_12016-03-022016-02-280.00.02.01.00.02.01.0
3FOODS_3_001_CA_12016-03-032016-02-280.00.02.01.00.02.00.0
4FOODS_3_001_CA_12016-03-042016-02-280.00.02.01.00.02.00.0

Merge cross validation forecasts

cv_df = cv_df.merge(cv_nf_df.drop(columns=['y']), how='left', on=['unique_id', 'ds', 'cutoff'])
cv_df = cv_df.merge(cv_mlf_df.drop(columns=['y']), how='left', on=['unique_id', 'ds', 'cutoff'])

Plots CV

cutoffs = cv_df['cutoff'].unique()
for cutoff in cutoffs:
    display(
        plot_series(
            Y_df, 
            cv_df.query('cutoff == @cutoff').drop(columns=['y', 'cutoff']), 
            max_insample_length=28 * 5, 
            ids=['FOODS_3_001_CA_1'],
        )
    )

Aggregate Demand

agg_cv_df = cv_df.loc[:,~cv_df.columns.str.contains('hi|lo')].groupby(['ds', 'cutoff']).sum(numeric_only=True).reset_index()
agg_cv_df.insert(0, 'unique_id', 'agg_demand')
agg_Y_df = Y_df.groupby(['ds']).sum(numeric_only=True).reset_index()
agg_Y_df.insert(0, 'unique_id', 'agg_demand')
for cutoff in cutoffs:
    display(
        plot_series(
            agg_Y_df, 
            agg_cv_df.query('cutoff == @cutoff').drop(columns=['y', 'cutoff']),
            max_insample_length=28 * 5,
        )
    )

Evaluation per series and CV window

In this section, we will evaluate the performance of each model for each time series.

from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mse, mae, smape
evaluation_df = evaluate(cv_df.drop(columns='cutoff'), metrics=[mse, mae, smape])
evaluation_df
unique_idmetricSeasonalNaiveNaiveHistoricAverageCrostonOptimizedADIDAIMAPAAutoETSAutoNHITSAutoTFTLGBMRegressorXGBRegressorLinearRegression
0FOODS_3_001_CA_1mse1.2500000.8928570.4851820.5079570.5092990.5169880.4942350.6309520.5714290.6489620.5847220.518863
1FOODS_3_001_CA_2mse6.2738093.7738093.4773093.4125803.4322953.4740503.4264684.5505953.6071433.4236463.8564653.713406
2FOODS_3_001_CA_3mse5.8809524.3571435.0163964.1731544.1606454.1767334.1451484.0059524.3720244.9287646.9377925.173173
3FOODS_3_001_CA_4mse1.0714290.4761900.4029380.3825590.3807830.3808770.3808720.4761900.4761900.6642700.4240680.592870
4FOODS_3_001_TX_1mse0.0476190.0476190.2388240.2613560.0476190.0476190.0775750.0476190.0476190.7187960.0635640.208862
24685FOODS_3_827_TX_2smape0.0833330.0357140.9895400.9963620.9873950.9828470.9815370.3238100.3357140.9763560.9947020.982738
24686FOODS_3_827_TX_3smape0.7085320.6814950.6624900.6530570.6558100.6601610.6491800.6839470.7121210.6395180.8568660.673560
24687FOODS_3_827_WI_1smape0.6087220.6943280.4705700.4708460.4800320.4800320.4669560.4868520.4759800.4723360.4849060.492325
24688FOODS_3_827_WI_2smape0.5317770.3981560.4335770.3877180.3888270.3893710.3898880.3937740.3746400.4135590.4308930.396423
24689FOODS_3_827_WI_3smape0.6436890.6801780.5880310.5891430.5998200.6286730.5914370.5582010.5674600.5898700.6987980.620368
by_metric = evaluation_df.groupby('metric').mean(numeric_only=True)
by_metric
SeasonalNaiveNaiveHistoricAverageCrostonOptimizedADIDAIMAPAAutoETSAutoNHITSAutoTFTLGBMRegressorXGBRegressorLinearRegression
metric
mae1.7754152.0459061.7490801.6347911.5420971.5437451.5115451.4382501.4976471.6979471.5520611.589638
mse14.26577320.45332512.93813611.48423311.09019511.09444610.3519279.60691310.72125110.50228911.56591610.781582
smape0.4364140.4464300.6168840.6132190.6189100.6193130.6200840.4007700.4110180.5798560.6936150.631614

Best models by metric

by_metric.idxmin(axis=1)
metric
mae      AutoNHITS
mse      AutoNHITS
smape    AutoNHITS
dtype: object

Distribution of errors

!pip install seaborn
import matplotlib.pyplot as plt
import seaborn as sns
evaluation_df_long = pd.melt(evaluation_df, id_vars=['unique_id', 'metric'], var_name='model', value_name='error')

SMAPE

sns.violinplot(evaluation_df_long.query('metric=="smape"'), x='error', y='model');

Choose models for groups of series

Feature:

  • A unified dataframe with forecasts for all different models
  • Easy Ensamble
  • E.g. Average predictions
  • Or MinMax (Choosing is ensembling)
# Choose the best model for each time series, metric, and cross validation window
evaluation_df['best_model'] = evaluation_df.idxmin(axis=1, numeric_only=True)
# count how many times a model wins per metric and cross validation window
count_best_model = evaluation_df.groupby(['metric', 'best_model']).size().rename('n').to_frame().reset_index()
# plot results
sns.barplot(count_best_model, x='n', y='best_model', hue='metric')

Et pluribus unum: an inclusive forecasting Pie.

# For the mse, calculate how many times a model wins
eval_series_df = evaluation_df.query('metric == "mse"').groupby(['unique_id']).mean(numeric_only=True)
eval_series_df['best_model'] = eval_series_df.idxmin(axis=1)
counts_series = eval_series_df.value_counts('best_model')
plt.pie(counts_series, labels=counts_series.index, autopct='%.0f%%')
plt.show()

plot_series(
    Y_df,
    cv_df.drop(columns=['cutoff', 'y']), 
    max_insample_length=28 * 6, 
    models=['AutoNHITS'],
)

Choose Forecasting method for different groups of series

# Merge the best model per time series dataframe
# and filter the forecasts based on that dataframe
# for each time series
fcst_df = pd.melt(fcst_df.set_index('unique_id'), id_vars=['ds'], var_name='model', value_name='forecast', ignore_index=False)
fcst_df = fcst_df.join(eval_series_df[['best_model']])
fcst_df[['model', 'pred-interval']] = fcst_df['model'].str.split('-', expand=True, n=1)
fcst_df = fcst_df.query('model == best_model')
fcst_df['name'] = [f'forecast-{x}' if x is not None else 'forecast' for x in fcst_df['pred-interval']]
fcst_df = pd.pivot_table(fcst_df, index=['unique_id', 'ds'], values=['forecast'], columns=['name']).droplevel(0, axis=1).reset_index()
plot_series(Y_df, fcst_df, max_insample_length=28 * 3)

Technical Debt

  • Train the statistical models in the full dataset.
  • Increase the number of num_samples in the neural auto models.
  • Include other models such as Theta, ARIMA, RNN, LSTM, …

Further materials