Step-by-step guide on using the IMAPA Model
with Statsforecast
.
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
date | sales | |
---|---|---|
0 | 2022-01-01 00:00:00 | 0 |
1 | 2022-01-01 01:00:00 | 10 |
2 | 2022-01-01 02:00:00 | 0 |
3 | 2022-01-01 03:00:00 | 0 |
4 | 2022-01-01 04:00:00 | 100 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2022-01-01 00:00:00 | 0 | 1 |
1 | 2022-01-01 01:00:00 | 10 | 1 |
2 | 2022-01-01 02:00:00 | 0 | 1 |
3 | 2022-01-01 03:00:00 | 0 | 1 |
4 | 2022-01-01 04:00:00 | 100 | 1 |
(ds)
is in an object format, we need
to convert to a date format
IMAPA Model
.season_length
.
freq:
a string indicating the frequency of the data. (See pandas’
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
IMAPA Model
. We can observe it with the
following instruction:
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 500 hours ahead.unique_id | ds | IMAPA | |
---|---|---|---|
0 | 1 | 2023-01-31 20:00:00 | 28.579695 |
1 | 1 | 2023-01-31 21:00:00 | 28.579695 |
2 | 1 | 2023-01-31 22:00:00 | 28.579695 |
… | … | … | … |
497 | 1 | 2023-02-21 13:00:00 | 28.579695 |
498 | 1 | 2023-02-21 14:00:00 | 28.579695 |
499 | 1 | 2023-02-21 15:00:00 | 28.579695 |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 500 hours ahead.unique_id | ds | IMAPA | |
---|---|---|---|
0 | 1 | 2023-01-31 20:00:00 | 28.579695 |
1 | 1 | 2023-01-31 21:00:00 | 28.579695 |
2 | 1 | 2023-01-31 22:00:00 | 28.579695 |
… | … | … | … |
497 | 1 | 2023-02-21 13:00:00 | 28.579695 |
498 | 1 | 2023-02-21 14:00:00 | 28.579695 |
499 | 1 | 2023-02-21 15:00:00 | 28.579695 |
(n_windows=)
, forecasting every second months
(step_size=50)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 500 hours ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
index. If you dont like working with index just run
crossvalidation_df.resetindex()
.ds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows
.y:
true valuemodel:
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | IMAPA | |
---|---|---|---|---|---|
0 | 1 | 2023-01-23 12:00:00 | 2023-01-23 11:00:00 | 0.0 | 15.134251 |
1 | 1 | 2023-01-23 13:00:00 | 2023-01-23 11:00:00 | 0.0 | 15.134251 |
2 | 1 | 2023-01-23 14:00:00 | 2023-01-23 11:00:00 | 0.0 | 15.134251 |
… | … | … | … | … | … |
2497 | 1 | 2023-02-21 13:00:00 | 2023-01-31 19:00:00 | 60.0 | 28.579695 |
2498 | 1 | 2023-02-21 14:00:00 | 2023-01-31 19:00:00 | 20.0 | 28.579695 |
2499 | 1 | 2023-02-21 15:00:00 | 2023-01-31 19:00:00 | 20.0 | 28.579695 |
unique_id | metric | IMAPA | |
---|---|---|---|
0 | 1 | mae | 34.206428 |
1 | 1 | mape | 0.637417 |
2 | 1 | mase | 0.816042 |
3 | 1 | rmse | 45.345223 |
4 | 1 | smape | 0.764973 |