Amazon’s AutoML vs open source statistical methods
https://m5-benchmarks.s3.amazonaws.com/data/train/target.parquet
https://m5-benchmarks.s3.amazonaws.com/data/train/temporal.parquet
https://m5-benchmarks.s3.amazonaws.com/data/train/static.parquet
Warning The M5 competition is hierarchical. That is, forecasts are required for different levels of aggregation: national, state, store, etc. In this experiment, we only generate forecasts using the bottom-level data. The evaluation is performed using the bottom-up reconciliation method to obtain the forecasts for the higher hierarchies.
s3://m5-benchmarks/forecasts/amazonforecast-m5.parquet
s3fs
to read from the S3 Filesystem of
AWS. (If you don’t want to use a cloud storage provider, you can read
your files locally using pandas)
.csv
.
The input to StatsForecast is always a data frame in long
format with
three columns: unique_id
, ds
and y
:
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast. We
will rename the
Warning
We are reading a file from S3, so you need to install the s3fs
library. To install it, run ! pip install s3fs
unique_id | ds | y | |
---|---|---|---|
0 | FOODS_1_001_CA_1 | 2011-01-29 | 3.0 |
1 | FOODS_1_001_CA_1 | 2011-01-30 | 0.0 |
2 | FOODS_1_001_CA_1 | 2011-01-31 | 0.0 |
3 | FOODS_1_001_CA_1 | 2011-02-01 | 1.0 |
4 | FOODS_1_001_CA_1 | 2011-02-02 | 4.0 |
StatsForecast
object with the following parameters:
models
: a list of models. Select the models you want from
models and import them. For this example, we will
use
AutoETS
and
DynamicOptimizedTheta
.
We set season_length
to 7 because we expect seasonal effects every
week. (See: Seasonal
periods)
freq
: a string indicating the frequency of the data. (See panda’s
available
frequencies.)
n_jobs
: n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model
: a model to be used if a model fails.
Note StatsForecast achieves its blazing speed using JIT compiling through Numba. The first time you call the statsforecast class, the fit method should take around 5 seconds. The second time -once Numba compiled your settings- it should take less than 0.2s.
AutoETS
:
Exponential Smoothing model. Automatically selects the best ETS
(Error, Trend, Seasonality) model using an information criterion.
Ref:
AutoETS
.
SeasonalNaive
:
Memory Efficient Seasonal Naive predictions. Ref:
SeasonalNaive
.
DynamicOptimizedTheta
:
fit two theta lines to a deseasonalized time series, using different
techniques to obtain and combine the two theta lines to produce the
final forecasts. Ref:
DynamicOptimizedTheta
.
forecast
method takes two arguments: forecasts the next h
(for
horizon) and level
.
h
(int): represents the forecast h steps into the future. In this
case, 12 months ahead.
level
(list of floats): this optional parameter is used for
probabilistic forecasting. Set the level
(or confidence
percentile) of your prediction interval. For example, level=[90]
means that the model expects the real value to be inside that
interval 90% of the times.
Note Theforecast
is inteded to be compatible with distributed clusters, so it does not store any model parameters. If you want to store parameter for everymodel you can use thefit
andpredict
methods. However, those methods are not defined for distrubed engines like Spark, Ray or Dask.
StatsForecast
and AmazonForecast
. To do this, we first need to install
datasetsforecast, a Python
library developed by Nixtla that includes a large battery of benchmark
datasets and evaluation utilities. The library will allow us to
calculate the performance of the models using the original evaluation
used in the competition.
Total | Level1 | Level2 | Level3 | Level4 | Level5 | Level6 | Level7 | Level8 | Level9 | Level10 | Level11 | Level12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
StatsForecast_ThETS_wrmsse | 0.669606 | 0.424331 | 0.515777 | 0.580670 | 0.474098 | 0.552459 | 0.578092 | 0.651079 | 0.642446 | 0.725324 | 1.009390 | 0.967537 | 0.914068 |
StatsForecast_AutoETS_wrmsse | 0.672404 | 0.430474 | 0.516340 | 0.580736 | 0.482090 | 0.559721 | 0.579939 | 0.655362 | 0.643638 | 0.727967 | 1.010596 | 0.968168 | 0.913820 |
StatsForecast_DynamicOptimizedTheta_wrmsse | 0.675333 | 0.429670 | 0.521640 | 0.589278 | 0.478730 | 0.557520 | 0.584278 | 0.656283 | 0.650613 | 0.731735 | 1.013910 | 0.971758 | 0.918576 |
AmazonForecast_p50_wrmsse | 1.617815 | 1.912144 | 1.786991 | 1.736382 | 1.972658 | 2.010498 | 1.805926 | 1.819329 | 1.667225 | 1.619216 | 1.156432 | 1.012942 | 0.914040 |