Forecast the M5 dataset
StatsForecast
and ray
to forecast thounsands of time series in less than 6 minutes
(M5 dataset). Also, we show that
StatsForecast
has better performance in time and accuracy compared to Prophet
running on a Spark
cluster
using DataBricks.
In this example, we used a ray cluster (AWS) of 11 instances of type
m5.2xlarge (8 cores, 32 GB RAM).
30,490
bottom time series.
unique_id | ds | y | |
---|---|---|---|
0 | FOODS_1_001_CA_1 | 2011-01-29 | 3.0 |
1 | FOODS_1_001_CA_1 | 2011-01-30 | 0.0 |
2 | FOODS_1_001_CA_1 | 2011-01-31 | 0.0 |
3 | FOODS_1_001_CA_1 | 2011-02-01 | 1.0 |
4 | FOODS_1_001_CA_1 | 2011-02-02 | 4.0 |
StatsForecast
receives a list of models to fit each time series. Since we are dealing
with Daily data, it would be benefitial to use 7 as seasonality. Observe
that we need to pass the ray address to the ray_address
argument.
StatsForecast
and ray
took only 5.48 minutes to train 30,490
time series, compared
to 18.23 minutes for Prophet and Spark.
We remove the constant.
wrmsse | |
---|---|
Total | 0.677233 |
Level1 | 0.435558 |
Level2 | 0.522863 |
Level3 | 0.582109 |
Level4 | 0.488484 |
Level5 | 0.567825 |
Level6 | 0.587605 |
Level7 | 0.662774 |
Level8 | 0.647712 |
Level9 | 0.732107 |
Level10 | 1.013124 |
Level11 | 0.970465 |
Level12 | 0.916175 |
StatsForecast
is more accurate than Prophet, since the overall WMRSSE is 0.68
,
against 0.77
obtained by prophet.