Amazon Forecast vs StatsForecast
Amazon’s AutoML vs open source statistical methods
Data
We will make use of the M5 competition dataset provided by Walmart. This dataset is interesting for its scale but also the fact that it features many timeseries with infrequent occurances. Such timeseries are common in retail scenarios and are difficult for traditional timeseries forecasting techniques to address.
The data are ready for download at the following URLs:
 Train set:
https://m5benchmarks.s3.amazonaws.com/data/train/target.parquet
 Temporal exogenous variables (used by AmazonForecast):
https://m5benchmarks.s3.amazonaws.com/data/train/temporal.parquet
 Static exogenous variables (used by AmazonForecast):
https://m5benchmarks.s3.amazonaws.com/data/train/static.parquet
A more detailed description of the data can be found here.
Warning
The M5 competition is hierarchical. That is, forecasts are required for different levels of aggregation: national, state, store, etc. In this experiment, we only generate forecasts using the bottomlevel data. The evaluation is performed using the bottomup reconciliation method to obtain the forecasts for the higher hierarchies.
Amazon Forecast
Amazon Forecast is a fully automated solution for time series forecasting. The solution can take the time series to forecast and exogenous variables (temporal and static). For this experiment, we used the AutoPredict functionality of Amazon Forecast following the steps of this tutorial. A detailed description of the particular steps for this dataset can be found here.
Amazon Forecast creates predictors with AutoPredictor, which involves applying the optimal combination of algorithms to each time series in your datasets. The predictor is an Amazon Forecast model that is trained using your target time series, related time series, item metadata, and any additional datasets you include.
Included algorithms range from commonly used statistical algorithms like Autoregressive Integrated Moving Average (ARIMA), to complex neural network algorithms like CNNQR and DeepAR+.: CNNQR, DeepAR+, Prophet, NPTS, ARIMA, and ETS.
To leverage the probabilistic features of Amazon Forecast and enable confidence intervals for further analysis we forecasted the following quantiles: 0.1  0.5  0.9.
The full pipeline of Amazon Forecast took 4.1 hours and the results can
be found here: s3://m5benchmarks/forecasts/amazonforecastm5.parquet
Nixtla’s StatsForecast
Install necessary libraries
We assume you have StatsForecast already installed. Check this guide for instructions on how to install StatsForecast.
Additionally, we will install s3fs
to read from the S3 Filesystem of
AWS. (If you don’t want to use a cloud storage provider, you can read
your files locally using pandas)
!pip install statsforecast s3fs
Input format
We will use pandas to read the data set stored in a parquet file for
efficiency. You can use ordinary pandas operations to read your data in
other formats likes .csv
.
The input to StatsForecast is always a data frame in long
format with
three columns: unique_id
, ds
and y
:

The
unique_id
(string, int or category) represents an identifier for the series. 
The
ds
(datestamp) column should be of a format expected by Pandas, ideally YYYYMMDD for a date or YYYYMMDD HH:MM:SS for a timestamp. 
The
y
(numeric) represents the measurement we wish to forecast. We will rename the
So we will rename the original columns to make it compatible with StatsForecast.
Depending on your internet connection, this step should take around 20 seconds.
Warning
We are reading a file from S3, so you need to install the s3fs library. To install it, run
! pip install s3fs
Read data
import pandas as pd
Y_df_m5 = pd.read_parquet('https://m5benchmarks.s3.amazonaws.com/data/train/target.parquet')
Y_df_m5 = Y_df_m5.rename(columns={
'item_id': 'unique_id',
'timestamp': 'ds',
'demand': 'y'
})
Y_df_m5.head()
unique_id  ds  y  

0  FOODS_1_001_CA_1  20110129  3.0 
1  FOODS_1_001_CA_1  20110130  0.0 
2  FOODS_1_001_CA_1  20110131  0.0 
3  FOODS_1_001_CA_1  20110201  1.0 
4  FOODS_1_001_CA_1  20110202  4.0 
Train statistical models
We fit the model by instantiating a new
StatsForecast
object with the following parameters:

models
: a list of models. Select the models you want from models and import them. For this example, we will useAutoETS
andDynamicOptimizedTheta
. We setseason_length
to 7 because we expect seasonal effects every week. (See: Seasonal periods) 
freq
: a string indicating the frequency of the data. (See panda’s available frequencies.) 
n_jobs
: n_jobs: int, number of jobs used in the parallel processing, use 1 for all cores. 
fallback_model
: a model to be used if a model fails.
Any settings are passed into the constructor. Then you call its fit method and pass in the historical data frame.
Note
StatsForecast achieves its blazing speed using JIT compiling through Numba. The first time you call the statsforecast class, the fit method should take around 5 seconds. The second time once Numba compiled your settings it should take less than 0.2s.

AutoETS
: Exponential Smoothing model. Automatically selects the best ETS (Error, Trend, Seasonality) model using an information criterion. Ref:AutoETS
. 
SeasonalNaive
: Memory Efficient Seasonal Naive predictions. Ref:SeasonalNaive
. 
DynamicOptimizedTheta
: fit two theta lines to a deseasonalized time series, using different techniques to obtain and combine the two theta lines to produce the final forecasts. Ref:DynamicOptimizedTheta
.
from statsforecast import StatsForecast
from statsforecast.models import (
AutoETS,
DynamicOptimizedTheta,
SeasonalNaive
)
# Create list of models
models = [
AutoETS(season_length=7),
DynamicOptimizedTheta(season_length=7),
]
# Instantiate StatsForecast class
sf = StatsForecast(
models=models,
freq='D',
n_jobs=1,
fallback_model=SeasonalNaive(season_length=7)
)
/home/ubuntu/fede/statsforecast/statsforecast/core.py:21: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
The
forecast
method takes two arguments: forecasts the next h
(for horizon) and
level
.

h
(int): represents the forecast h steps into the future. In this case, 12 months ahead. 
level
(list of floats): this optional parameter is used for probabilistic forecasting. Set thelevel
(or confidence percentile) of your prediction interval. For example,level=[90]
means that the model expects the real value to be inside that interval 90% of the times.
The forecast object here is a new data frame that includes a column with the name of the model and the y hat values, as well as columns for the uncertainty intervals.
Note
The
forecast
is inteded to be compatible with distributed clusters, so it does not store any model parameters. If you want to store parameter for everymodel you can use thefit
andpredict
methods. However, those methods are not defined for distrubed engines like Spark, Ray or Dask.
from time import time
init = time()
forecasts_df = sf.forecast(df=Y_df_m5, h=28)
end = time()
print(f'Statsforecast time M5 {(end  init) / 60}')
Statsforecast time M5 14.274124479293823
Store the results for further evaluation.
forecasts_df['ThETS'] = forecasts_df[['DynamicOptimizedTheta', 'AutoETS']].clip(0).median(axis=1, numeric_only=True)
forecasts_df.to_parquet('s3://m5benchmarks/forecasts/statsforecastm5.parquet')
Evaluation
This section evaluates the performance of
StatsForecast
and AmazonForecast
. To do this, we first need to install
datasetsforecast, a Python
library developed by Nixtla that includes a large battery of benchmark
datasets and evaluation utilities. The library will allow us to
calculate the performance of the models using the original evaluation
used in the competition.
!pip install datasetsforecast
from datasetsforecast.m5 import M5, M5Evaluation
The following function will allow us to evaluate a specific model included in the input dataframe. The function is useful for evaluating different models.
from datasetsforecast.m5 import M5, M5Evaluation
from statsforecast import StatsForecast
### Evaluator
def evaluate_forecasts(df, model, model_name):
Y_hat = df.set_index('ds', append=True)[model].unstack()
*_, S_df = M5.load('data')
Y_hat = S_df.merge(Y_hat, how='left', on=['unique_id'])
eval_ = M5Evaluation.evaluate(y_hat=Y_hat, directory='./data')
eval_ = eval_.rename(columns={'wrmsse': f'{model_name}_{model}_wrmsse'})
return eval_
Now let’s read the forecasts generated for each solution.
### Read Forecasts
statsforecasts_df = pd.read_parquet('s3://m5benchmarks/forecasts/statsforecastm5.parquet')
amazonforecasts_df = pd.read_parquet('s3://m5benchmarks/forecasts/amazonforecastm5.parquet')
### Amazon Forecast wrangling
amazonforecasts_df = amazonforecasts_df.rename(columns={'item_id': 'unique_id', 'date': 'ds'})
# amazon forecast returns the unique_id column in lower case
# we need to transform it to upper case to ensure proper merging
amazonforecasts_df['unique_id'] = amazonforecasts_df['unique_id'].str.upper()
amazonforecasts_df = amazonforecasts_df.set_index('unique_id')
# parse datestamp
amazonforecasts_df['ds'] = pd.to_datetime(amazonforecasts_df['ds']).dt.tz_localize(None)
Finally, let’s use our predefined function to compute the performance of each model.
### Evaluate performances
m5_eval_df = pd.concat([
evaluate_forecasts(statsforecasts_df, 'ThETS', 'StatsForecast'),
evaluate_forecasts(statsforecasts_df, 'AutoETS', 'StatsForecast'),
evaluate_forecasts(statsforecasts_df, 'DynamicOptimizedTheta', 'StatsForecast'),
evaluate_forecasts(amazonforecasts_df, 'p50', 'AmazonForecast'),
], axis=1)
m5_eval_df.T
Total  Level1  Level2  Level3  Level4  Level5  Level6  Level7  Level8  Level9  Level10  Level11  Level12  

StatsForecast_ThETS_wrmsse  0.669606  0.424331  0.515777  0.580670  0.474098  0.552459  0.578092  0.651079  0.642446  0.725324  1.009390  0.967537  0.914068 
StatsForecast_AutoETS_wrmsse  0.672404  0.430474  0.516340  0.580736  0.482090  0.559721  0.579939  0.655362  0.643638  0.727967  1.010596  0.968168  0.913820 
StatsForecast_DynamicOptimizedTheta_wrmsse  0.675333  0.429670  0.521640  0.589278  0.478730  0.557520  0.584278  0.656283  0.650613  0.731735  1.013910  0.971758  0.918576 
AmazonForecast_p50_wrmsse  1.617815  1.912144  1.786991  1.736382  1.972658  2.010498  1.805926  1.819329  1.667225  1.619216  1.156432  1.012942  0.914040 
The results (including processing time and costs) can be summarized in the following table.