Forecast the M5 datasetIn this notebook we show how to use
StatsForecast and ray to
forecast thounsands of time series in less than 6 minutes (M5 dataset).
Also, we show that StatsForecast has better performance in time and
accuracy compared to Prophet running on a Spark
cluster using DataBricks.
In this example, we used a ray cluster (AWS) of 11 instances of type
m5.2xlarge (8 cores, 32 GB RAM).
Installing StatsForecast Library
Download data
The example uses the M5 dataset. It consists of30,490 bottom time series.
| unique_id | ds | y | |
|---|---|---|---|
| 0 | FOODS_1_001_CA_1 | 2011-01-29 | 3.0 |
| 1 | FOODS_1_001_CA_1 | 2011-01-30 | 0.0 |
| 2 | FOODS_1_001_CA_1 | 2011-01-31 | 0.0 |
| 3 | FOODS_1_001_CA_1 | 2011-02-01 | 1.0 |
| 4 | FOODS_1_001_CA_1 | 2011-02-02 | 4.0 |
Train the model
StatsForecast receives a list of models to fit each time series. Since
we are dealing with Daily data, it would be benefitial to use 7 as
seasonality. Observe that we need to pass the ray address to the
ray_address argument.
StatsForecast and ray took only 5.48 minutes to train 30,490 time
series, compared to 18.23 minutes for Prophet and Spark.
We remove the constant.
Evaluating performance
The M5 competition used the weighted root mean squared scaled error. You can find details of the metric here.| wrmsse | |
|---|---|
| Total | 0.677233 |
| Level1 | 0.435558 |
| Level2 | 0.522863 |
| Level3 | 0.582109 |
| Level4 | 0.488484 |
| Level5 | 0.567825 |
| Level6 | 0.587605 |
| Level7 | 0.662774 |
| Level8 | 0.647712 |
| Level9 | 0.732107 |
| Level10 | 1.013124 |
| Level11 | 0.970465 |
| Level12 | 0.916175 |
StatsForecast is more accurate than Prophet, since the overall
WMRSSE is 0.68, against 0.77 obtained by prophet.
