Forecasting at Scale using ETS and ray (M5)
Forecast the M5 dataset
In this notebook we show how to use
StatsForecast
and ray
to forecast thounsands of time series in less than 6 minutes
(M5 dataset). Also, we show that
StatsForecast
has better performance in time and accuracy compared to Prophet
running on a Spark
cluster
using DataBricks.
In this example, we used a ray cluster (AWS) of 11 instances of type m5.2xlarge (8 cores, 32 GB RAM).
Installing StatsForecast Library
Download data
The example uses the M5
dataset.
It consists of 30,490
bottom time series.
unique_id | ds | y | |
---|---|---|---|
0 | FOODS_1_001_CA_1 | 2011-01-29 | 3.0 |
1 | FOODS_1_001_CA_1 | 2011-01-30 | 0.0 |
2 | FOODS_1_001_CA_1 | 2011-01-31 | 0.0 |
3 | FOODS_1_001_CA_1 | 2011-02-01 | 1.0 |
4 | FOODS_1_001_CA_1 | 2011-02-02 | 4.0 |
Since the M5 dataset contains intermittent time series, we add a constant to avoid problems during the training phase. Later, we will substract the constant from the forecasts.
Train the model
StatsForecast
receives a list of models to fit each time series. Since we are dealing
with Daily data, it would be benefitial to use 7 as seasonality. Observe
that we need to pass the ray address to the ray_address
argument.
StatsForecast
and ray
took only 5.48 minutes to train 30,490
time series, compared
to 18.23 minutes for Prophet and Spark.
We remove the constant.
Evaluating performance
The M5 competition used the weighted root mean squared scaled error. You can find details of the metric here.
wrmsse | |
---|---|
Total | 0.677233 |
Level1 | 0.435558 |
Level2 | 0.522863 |
Level3 | 0.582109 |
Level4 | 0.488484 |
Level5 | 0.567825 |
Level6 | 0.587605 |
Level7 | 0.662774 |
Level8 | 0.647712 |
Level9 | 0.732107 |
Level10 | 1.013124 |
Level11 | 0.970465 |
Level12 | 0.916175 |
Also,
StatsForecast
is more accurate than Prophet, since the overall WMRSSE is 0.68
,
against 0.77
obtained by prophet.