Step-by-step guide on using the OptimizedTheta Model
with Statsforecast
.
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
month | production | |
---|---|---|
0 | 1962-01-01 | 589 |
1 | 1962-02-01 | 561 |
2 | 1962-03-01 | 640 |
3 | 1962-04-01 | 656 |
4 | 1962-05-01 | 727 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 1962-01-01 | 589 | 1 |
1 | 1962-02-01 | 561 | 1 |
2 | 1962-03-01 | 640 | 1 |
3 | 1962-04-01 | 656 | 1 |
4 | 1962-05-01 | 727 | 1 |
(ds)
is in an object format, we need
to convert to a date format
Optimized Theta model
. 2. Data to test our model
For the test data we will use the last 12 months to test and evaluate
the performance of our model.
season_length
.
freq:
a string indicating the frequency of the data. (See pandas’
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
Optimized Theta Model (OTM)
. We can
observe it with the following instruction:
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
residual Model | |
---|---|
0 | -271.899414 |
1 | -114.671692 |
2 | 4.768066 |
… | … |
153 | -60.233887 |
154 | -92.472839 |
155 | -44.143982 |
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[90]
means that
the model expects the real value to be inside that interval 90% of
the times.
ARIMA
and
Theta
)
unique_id | ds | OptimizedTheta | |
---|---|---|---|
0 | 1 | 1975-01-01 | 839.682800 |
1 | 1 | 1975-02-01 | 802.071838 |
2 | 1 | 1975-03-01 | 896.117126 |
… | … | … | … |
9 | 1 | 1975-10-01 | 824.135498 |
10 | 1 | 1975-11-01 | 795.691223 |
11 | 1 | 1975-12-01 | 833.316345 |
unique_id | ds | y | OptimizedTheta | |
---|---|---|---|---|
0 | 1 | 1962-01-01 | 589.0 | 860.899414 |
1 | 1 | 1962-02-01 | 561.0 | 675.671692 |
2 | 1 | 1962-03-01 | 640.0 | 635.231934 |
3 | 1 | 1962-04-01 | 656.0 | 614.731323 |
4 | 1 | 1962-05-01 | 727.0 | 609.770752 |
unique_id | ds | OptimizedTheta | OptimizedTheta-lo-95 | OptimizedTheta-hi-95 | |
---|---|---|---|---|---|
0 | 1 | 1975-01-01 | 839.682800 | 742.509583 | 955.414307 |
1 | 1 | 1975-02-01 | 802.071838 | 643.581360 | 945.119202 |
2 | 1 | 1975-03-01 | 896.117126 | 710.785095 | 1065.057495 |
… | … | … | … | … | … |
9 | 1 | 1975-10-01 | 824.135498 | 555.948669 | 1084.320190 |
10 | 1 | 1975-11-01 | 795.691223 | 503.147858 | 1036.519531 |
11 | 1 | 1975-12-01 | 833.316345 | 530.259705 | 1106.636597 |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[95]
means that
the model expects the real value to be inside that interval 95% of
the times.
unique_id | ds | OptimizedTheta | |
---|---|---|---|
0 | 1 | 1975-01-01 | 839.682800 |
1 | 1 | 1975-02-01 | 802.071838 |
2 | 1 | 1975-03-01 | 896.117126 |
… | … | … | … |
9 | 1 | 1975-10-01 | 824.135498 |
10 | 1 | 1975-11-01 | 795.691223 |
11 | 1 | 1975-12-01 | 833.316345 |
unique_id | ds | OptimizedTheta | OptimizedTheta-lo-80 | OptimizedTheta-hi-80 | OptimizedTheta-lo-95 | OptimizedTheta-hi-95 | |
---|---|---|---|---|---|---|---|
0 | 1 | 1975-01-01 | 839.682800 | 766.665955 | 928.326172 | 742.509583 | 955.414307 |
1 | 1 | 1975-02-01 | 802.071838 | 704.290039 | 899.335815 | 643.581360 | 945.119202 |
2 | 1 | 1975-03-01 | 896.117126 | 761.334778 | 1007.408447 | 710.785095 | 1065.057495 |
… | … | … | … | … | … | … | … |
9 | 1 | 1975-10-01 | 824.135498 | 623.903992 | 996.567200 | 555.948669 | 1084.320190 |
10 | 1 | 1975-11-01 | 795.691223 | 576.546570 | 975.490784 | 503.147858 | 1036.519531 |
11 | 1 | 1975-12-01 | 833.316345 | 606.713623 | 1033.885742 | 530.259705 | 1106.636597 |
(n_windows=5)
, forecasting every second months
(step_size=12)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 12 months ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
index. If you dont like working with index just run
crossvalidation_df.resetindex()ds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows.y:
true value"model":
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | OptimizedTheta | |
---|---|---|---|---|---|
0 | 1 | 1972-01-01 | 1971-12-01 | 826.0 | 828.836365 |
1 | 1 | 1972-02-01 | 1971-12-01 | 799.0 | 792.592346 |
2 | 1 | 1972-03-01 | 1971-12-01 | 890.0 | 883.269592 |
… | … | … | … | … | … |
33 | 1 | 1974-10-01 | 1973-12-01 | 812.0 | 812.183838 |
34 | 1 | 1974-11-01 | 1973-12-01 | 773.0 | 783.898376 |
35 | 1 | 1974-12-01 | 1973-12-01 | 813.0 | 821.124329 |
unique_id | metric | OptimizedTheta | |
---|---|---|---|
0 | 1 | mae | 6.740204 |
1 | 1 | mape | 0.007828 |
2 | 1 | mase | 0.303120 |
3 | 1 | rmse | 8.701501 |
4 | 1 | smape | 0.003893 |