We will make use of the M5 competition dataset provided by Walmart. This dataset is interesting for its scale but also the fact that it features many timeseries with infrequent occurances. Such timeseries are common in retail scenarios and are difficult for traditional timeseries forecasting techniques to address.

The data are ready for download at the following URLs:

  • Train set:
  • Temporal exogenous variables (used by AmazonForecast):
  • Static exogenous variables (used by AmazonForecast):

A more detailed description of the data can be found here.


The M5 competition is hierarchical. That is, forecasts are required for different levels of aggregation: national, state, store, etc. In this experiment, we only generate forecasts using the bottom-level data. The evaluation is performed using the bottom-up reconciliation method to obtain the forecasts for the higher hierarchies.

Amazon Forecast

Amazon Forecast is a fully automated solution for time series forecasting. The solution can take the time series to forecast and exogenous variables (temporal and static). For this experiment, we used the AutoPredict functionality of Amazon Forecast following the steps of this tutorial. A detailed description of the particular steps for this dataset can be found here.

Amazon Forecast creates predictors with AutoPredictor, which involves applying the optimal combination of algorithms to each time series in your datasets. The predictor is an Amazon Forecast model that is trained using your target time series, related time series, item metadata, and any additional datasets you include.

Included algorithms range from commonly used statistical algorithms like Autoregressive Integrated Moving Average (ARIMA), to complex neural network algorithms like CNN-QR and DeepAR+.: CNN-QR, DeepAR+, Prophet, NPTS, ARIMA, and ETS.

To leverage the probabilistic features of Amazon Forecast and enable confidence intervals for further analysis we forecasted the following quantiles: 0.1 | 0.5 | 0.9.

The full pipeline of Amazon Forecast took 4.1 hours and the results can be found here: s3://m5-benchmarks/forecasts/amazonforecast-m5.parquet

Nixtla’s StatsForecast

Install necessary libraries

We assume you have StatsForecast already installed. Check this guide for instructions on how to install StatsForecast.

Additionally, we will install s3fs to read from the S3 Filesystem of AWS. (If you don’t want to use a cloud storage provider, you can read your files locally using pandas)

!pip install statsforecast s3fs

Input format

We will use pandas to read the data set stored in a parquet file for efficiency. You can use ordinary pandas operations to read your data in other formats likes .csv.

The input to StatsForecast is always a data frame in long format with three columns: unique_id, ds and y:

  • The unique_id (string, int or category) represents an identifier for the series.

  • The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp.

  • The y (numeric) represents the measurement we wish to forecast. We will rename the

So we will rename the original columns to make it compatible with StatsForecast.

Depending on your internet connection, this step should take around 20 seconds.


We are reading a file from S3, so you need to install the s3fs library. To install it, run ! pip install s3fs

Read data

import pandas as pd

Y_df_m5 = pd.read_parquet('') 

Y_df_m5 = Y_df_m5.rename(columns={
    'item_id': 'unique_id', 
    'timestamp': 'ds', 
    'demand': 'y'


Train statistical models

We fit the model by instantiating a new StatsForecast object with the following parameters:

  • models: a list of models. Select the models you want from models and import them. For this example, we will use AutoETS and DynamicOptimizedTheta. We set season_length to 7 because we expect seasonal effects every week. (See: Seasonal periods)

  • freq: a string indicating the frequency of the data. (See panda’s available frequencies.)

  • n_jobs: n_jobs: int, number of jobs used in the parallel processing, use -1 for all cores.

  • fallback_model: a model to be used if a model fails.

Any settings are passed into the constructor. Then you call its fit method and pass in the historical data frame.


StatsForecast achieves its blazing speed using JIT compiling through Numba. The first time you call the statsforecast class, the fit method should take around 5 seconds. The second time -once Numba compiled your settings- it should take less than 0.2s.

  • AutoETS: Exponential Smoothing model. Automatically selects the best ETS (Error, Trend, Seasonality) model using an information criterion. Ref: AutoETS.

  • SeasonalNaive: Memory Efficient Seasonal Naive predictions. Ref: SeasonalNaive.

  • DynamicOptimizedTheta: fit two theta lines to a deseasonalized time series, using different techniques to obtain and combine the two theta lines to produce the final forecasts. Ref: DynamicOptimizedTheta.

from statsforecast import StatsForecast
from statsforecast.models import (

# Create list of models
models = [

# Instantiate StatsForecast class
sf = StatsForecast( 
/home/ubuntu/fede/statsforecast/statsforecast/ TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm

The forecast method takes two arguments: forecasts the next h (for horizon) and level.

  • h (int): represents the forecast h steps into the future. In this case, 12 months ahead.

  • level (list of floats): this optional parameter is used for probabilistic forecasting. Set the level (or confidence percentile) of your prediction interval. For example, level=[90] means that the model expects the real value to be inside that interval 90% of the times.

The forecast object here is a new data frame that includes a column with the name of the model and the y hat values, as well as columns for the uncertainty intervals.


The forecast is inteded to be compatible with distributed clusters, so it does not store any model parameters. If you want to store parameter for everymodel you can use the fit and predict methods. However, those methods are not defined for distrubed engines like Spark, Ray or Dask.

from time import time
init = time()
forecasts_df = sf.forecast(df=Y_df_m5, h=28)
end = time()
print(f'Statsforecast time M5 {(end - init) / 60}')
Statsforecast time M5 14.274124479293823

Store the results for further evaluation.

forecasts_df['ThETS'] = forecasts_df[['DynamicOptimizedTheta', 'AutoETS']].clip(0).median(axis=1, numeric_only=True)


This section evaluates the performance of StatsForecast and AmazonForecast. To do this, we first need to install datasetsforecast, a Python library developed by Nixtla that includes a large battery of benchmark datasets and evaluation utilities. The library will allow us to calculate the performance of the models using the original evaluation used in the competition.

!pip install datasetsforecast
from datasetsforecast.m5 import M5, M5Evaluation

The following function will allow us to evaluate a specific model included in the input dataframe. The function is useful for evaluating different models.

from datasetsforecast.m5 import M5, M5Evaluation
from statsforecast import StatsForecast

### Evaluator
def evaluate_forecasts(df, model, model_name):
    Y_hat = df.set_index('ds', append=True)[model].unstack()
    *_, S_df = M5.load('data')
    Y_hat = S_df.merge(Y_hat, how='left', on=['unique_id'])
    eval_ = M5Evaluation.evaluate(y_hat=Y_hat, directory='./data')
    eval_ = eval_.rename(columns={'wrmsse': f'{model_name}_{model}_wrmsse'})
    return eval_

Now let’s read the forecasts generated for each solution.

### Read Forecasts
statsforecasts_df = pd.read_parquet('s3://m5-benchmarks/forecasts/statsforecast-m5.parquet')
amazonforecasts_df = pd.read_parquet('s3://m5-benchmarks/forecasts/amazonforecast-m5.parquet')

### Amazon Forecast wrangling
amazonforecasts_df = amazonforecasts_df.rename(columns={'item_id': 'unique_id', 'date': 'ds'})
# amazon forecast returns the unique_id column in lower case
# we need to transform it to upper case to ensure proper merging
amazonforecasts_df['unique_id'] = amazonforecasts_df['unique_id'].str.upper()
amazonforecasts_df = amazonforecasts_df.set_index('unique_id')
# parse datestamp
amazonforecasts_df['ds'] = pd.to_datetime(amazonforecasts_df['ds']).dt.tz_localize(None)

Finally, let’s use our predefined function to compute the performance of each model.

### Evaluate performances
m5_eval_df = pd.concat([
    evaluate_forecasts(statsforecasts_df, 'ThETS', 'StatsForecast'),
    evaluate_forecasts(statsforecasts_df, 'AutoETS', 'StatsForecast'),
    evaluate_forecasts(statsforecasts_df, 'DynamicOptimizedTheta', 'StatsForecast'),
    evaluate_forecasts(amazonforecasts_df, 'p50', 'AmazonForecast'),
], axis=1)

The results (including processing time and costs) can be summarized in the following table.