Step-by-step guide on using the MSTL Model
with Statsforecast
.
Frecuencia |
---|
Data | Minute | Hour | Day | Week | Year |
---|---|---|---|---|---|
Daily | 7 | 365.25 | |||
Hourly | 24 | 168 | 8766 | ||
Half-hourly | 48 | 336 | 17532 | ||
Minutes | 60 | 1440 | 10080 | 525960 | |
Seconds | 60 | 3600 | 86400 | 604800 | 31557600 |
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
Time | Ads | |
---|---|---|
0 | 2017-09-13T00:00:00 | 80115 |
1 | 2017-09-13T01:00:00 | 79885 |
2 | 2017-09-13T02:00:00 | 89325 |
3 | 2017-09-13T03:00:00 | 101930 |
4 | 2017-09-13T04:00:00 | 121630 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2017-09-13T00:00:00 | 80115 | 1 |
1 | 2017-09-13T01:00:00 | 79885 | 1 |
2 | 2017-09-13T02:00:00 | 89325 | 1 |
3 | 2017-09-13T03:00:00 | 101930 | 1 |
4 | 2017-09-13T04:00:00 | 121630 | 1 |
(ds)
is in an object format, we need
to convert to a date format
MSTL Model
. 2.
Data to test our model
For the test data we will use the last 30 Hours to test and evaluate the
performance of our model.
season_length
.
First, we must define the model parameters. As mentioned before, the
Candy production load presents seasonalities every 24 hours (Hourly) and
every 24 * 7 (Daily) hours. Therefore, we will use [24, 24 * 7]
for
season length. The trend component will be forecasted with an
AutoARIMA
model. (You can also try with:
AutoTheta
,
AutoCES
,
and
AutoETS
)
freq:
a string indicating the frequency of the data. (See pandas’
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
MSTL Model
. We can observe it with the
following instruction:
data | trend | seasonal24 | seasonal168 | remainder | |
---|---|---|---|---|---|
0 | 80115.0 | 126222.558267 | -42511.086107 | -1524.379074 | -2072.093085 |
1 | 79885.0 | 126191.340644 | -43585.928105 | -1315.292640 | -1405.119899 |
2 | 89325.0 | 126160.117727 | -36756.458517 | 659.187427 | -737.846637 |
… | … | … | … | … | … |
183 | 141590.0 | 120314.325647 | 25363.015190 | -2808.715638 | -1278.625199 |
184 | 140610.0 | 120280.850692 | 26306.688690 | -6221.712712 | 244.173330 |
185 | 139515.0 | 120247.361703 | 27571.777796 | -5745.053631 | -2559.085868 |
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 30 hours ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[90]
means that
the model expects the real value to be inside that interval 90% of
the times.
ARIMA
and
Theta
)
unique_id | ds | MSTL | |
---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 157848.500000 |
1 | 1 | 2017-09-20 19:00:00 | 159790.328125 |
2 | 1 | 2017-09-20 20:00:00 | 133002.281250 |
… | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 98109.875000 |
28 | 1 | 2017-09-21 22:00:00 | 86342.015625 |
29 | 1 | 2017-09-21 23:00:00 | 76815.976562 |
unique_id | ds | y | MSTL | |
---|---|---|---|---|
0 | 1 | 2017-09-13 00:00:00 | 80115.0 | 79990.851562 |
1 | 1 | 2017-09-13 01:00:00 | 79885.0 | 79329.132812 |
2 | 1 | 2017-09-13 02:00:00 | 89325.0 | 88401.179688 |
3 | 1 | 2017-09-13 03:00:00 | 101930.0 | 102109.929688 |
4 | 1 | 2017-09-13 04:00:00 | 121630.0 | 123543.671875 |
unique_id | ds | MSTL | MSTL-lo-95 | MSTL-hi-95 | |
---|---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 157848.500000 | 157796.406250 | 157900.593750 |
1 | 1 | 2017-09-20 19:00:00 | 159790.328125 | 159714.218750 | 159866.437500 |
2 | 1 | 2017-09-20 20:00:00 | 133002.281250 | 132893.937500 | 133110.609375 |
… | … | … | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 98109.875000 | 95957.031250 | 100262.726562 |
28 | 1 | 2017-09-21 22:00:00 | 86342.015625 | 85410.578125 | 87273.460938 |
29 | 1 | 2017-09-21 23:00:00 | 76815.976562 | 73476.195312 | 80155.757812 |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 30 hours ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[95]
means that
the model expects the real value to be inside that interval 95% of
the times.
unique_id | ds | MSTL | |
---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 157848.500000 |
1 | 1 | 2017-09-20 19:00:00 | 159790.328125 |
2 | 1 | 2017-09-20 20:00:00 | 133002.281250 |
… | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 98109.875000 |
28 | 1 | 2017-09-21 22:00:00 | 86342.015625 |
29 | 1 | 2017-09-21 23:00:00 | 76815.976562 |
unique_id | ds | MSTL | MSTL-lo-95 | MSTL-lo-80 | MSTL-hi-80 | MSTL-hi-95 | |
---|---|---|---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 157848.500000 | 157796.406250 | 157798.484375 | 157898.531250 | 157900.593750 |
1 | 1 | 2017-09-20 19:00:00 | 159790.328125 | 159714.218750 | 159716.187500 | 159864.468750 | 159866.437500 |
2 | 1 | 2017-09-20 20:00:00 | 133002.281250 | 132893.937500 | 132894.515625 | 133110.031250 | 133110.609375 |
… | … | … | … | … | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 98109.875000 | 95957.031250 | 96493.921875 | 99725.828125 | 100262.726562 |
28 | 1 | 2017-09-21 22:00:00 | 86342.015625 | 85410.578125 | 85411.835938 | 87272.195312 | 87273.460938 |
29 | 1 | 2017-09-21 23:00:00 | 76815.976562 | 73476.195312 | 74494.546875 | 79137.406250 | 80155.757812 |
(n_windows=)
, forecasting every second months
(step_size=50)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 500 hours ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
series identifierds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows
.y:
true valuemodel:
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | MSTL | |
---|---|---|---|---|---|
0 | 1 | 2017-09-15 18:00:00 | 2017-09-15 17:00:00 | 159725.0 | 158384.250000 |
1 | 1 | 2017-09-15 19:00:00 | 2017-09-15 17:00:00 | 161085.0 | 162015.171875 |
2 | 1 | 2017-09-15 20:00:00 | 2017-09-15 17:00:00 | 135520.0 | 138495.093750 |
… | … | … | … | … | … |
147 | 1 | 2017-09-21 21:00:00 | 2017-09-20 17:00:00 | 103080.0 | 98109.875000 |
148 | 1 | 2017-09-21 22:00:00 | 2017-09-20 17:00:00 | 95155.0 | 86342.015625 |
149 | 1 | 2017-09-21 23:00:00 | 2017-09-20 17:00:00 | 80285.0 | 76815.976562 |
unique_id | metric | MSTL | |
---|---|---|---|
0 | 1 | mae | 4932.395052 |
1 | 1 | mape | 0.040514 |
2 | 1 | mase | 0.609407 |
3 | 1 | rmse | 6495.207028 |
4 | 1 | smape | 0.020267 |