Step-by-step guide on using the Holt Model
with Statsforecast
.
Simple exponential smoothing
does not function well when the data has
trends. In those cases, we can use double exponential smoothing. This
is a more reliable method for handling data that consumes trends without
seasonality than compared to other methods. This method adds a time
trend equation in the formulation. Two different weights, or smoothing
parameters, are used to update these two components at a time.
Holt’s exponential smoothing is also sometimes called double
exponential smoothing. The main idea here is to use SES and advance it
to capture the trend component.
Holt (1957) extended simple exponential smoothing to allow the
forecasting of data with a trend. This method involves a forecast
equation and two smoothing equations (one for the level and one for
the trend):
Assume that a series has the following:
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
Time | Ads | |
---|---|---|
0 | 2017-09-13T00:00:00 | 80115 |
1 | 2017-09-13T01:00:00 | 79885 |
2 | 2017-09-13T02:00:00 | 89325 |
3 | 2017-09-13T03:00:00 | 101930 |
4 | 2017-09-13T04:00:00 | 121630 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2017-09-13T00:00:00 | 80115 | 1 |
1 | 2017-09-13T01:00:00 | 79885 | 1 |
2 | 2017-09-13T02:00:00 | 89325 | 1 |
3 | 2017-09-13T03:00:00 | 101930 | 1 |
4 | 2017-09-13T04:00:00 | 121630 | 1 |
(ds)
is in an object format, we need
to convert to a date format
Holt Model
. 2.
Data to test our model
For the test data we will use the last 30 hours to test and evaluate the
performance of our model.
season_length
.
freq:
a string indicating the frequency of the data. (See pandas’
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
Holt Model
. We can observe it with the
following instruction:
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
residual Model | |
---|---|
0 | -16.629196 |
1 | -563.340440 |
2 | 9106.661223 |
… | … |
183 | -268.370897 |
184 | -1313.391081 |
185 | -1428.364244 |
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[90]
means that
the model expects the real value to be inside that interval 90% of
the times.
unique_id | ds | Add | Multi | |
---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 139848.234375 | 141089.625000 |
1 | 1 | 2017-09-20 19:00:00 | 140181.328125 | 142664.000000 |
2 | 1 | 2017-09-20 20:00:00 | 140514.406250 | 144238.359375 |
… | … | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 148841.671875 | 183597.453125 |
28 | 1 | 2017-09-21 22:00:00 | 149174.750000 | 185171.812500 |
29 | 1 | 2017-09-21 23:00:00 | 149507.843750 | 186746.187500 |
unique_id | ds | y | Add | Multi | |
---|---|---|---|---|---|
0 | 1 | 2017-09-13 00:00:00 | 80115.0 | 80131.632812 | 79287.125000 |
1 | 1 | 2017-09-13 01:00:00 | 79885.0 | 80448.343750 | 81712.710938 |
2 | 1 | 2017-09-13 02:00:00 | 89325.0 | 80218.335938 | 81482.796875 |
3 | 1 | 2017-09-13 03:00:00 | 101930.0 | 89658.281250 | 90922.609375 |
4 | 1 | 2017-09-13 04:00:00 | 121630.0 | 102264.195312 | 103528.398438 |
unique_id | ds | Add | Add-lo-95 | Add-hi-95 | Multi | Multi-lo-95 | Multi-hi-95 | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 139848.234375 | 116559.250000 | 163137.218750 | 141089.625000 | 113501.140625 | 168678.125000 |
1 | 1 | 2017-09-20 19:00:00 | 140181.328125 | 107245.734375 | 173116.906250 | 142664.000000 | 103333.265625 | 181994.718750 |
2 | 1 | 2017-09-20 20:00:00 | 140514.406250 | 100175.375000 | 180853.453125 | 144238.359375 | 95679.804688 | 192796.921875 |
… | … | … | … | … | … | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 148841.671875 | 25453.445312 | 272229.875000 | 183597.453125 | 4082.392090 | 363112.531250 |
28 | 1 | 2017-09-21 22:00:00 | 149174.750000 | 23596.246094 | 274753.250000 | 185171.812500 | 1151.084961 | 369192.562500 |
29 | 1 | 2017-09-21 23:00:00 | 149507.843750 | 21776.173828 | 277239.531250 | 186746.187500 | -1776.010254 | 375268.375000 |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.
level (list of floats):
this optional parameter is used for
probabilistic forecasting. Set the level (or confidence percentile)
of your prediction interval. For example, level=[95]
means that
the model expects the real value to be inside that interval 95% of
the times.
unique_id | ds | Add | Multi | |
---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 139848.234375 | 141089.625000 |
1 | 1 | 2017-09-20 19:00:00 | 140181.328125 | 142664.000000 |
2 | 1 | 2017-09-20 20:00:00 | 140514.406250 | 144238.359375 |
… | … | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 148841.671875 | 183597.453125 |
28 | 1 | 2017-09-21 22:00:00 | 149174.750000 | 185171.812500 |
29 | 1 | 2017-09-21 23:00:00 | 149507.843750 | 186746.187500 |
unique_id | ds | Add | Add-lo-95 | Add-lo-80 | Add-hi-80 | Add-hi-95 | Multi | Multi-lo-95 | Multi-lo-80 | Multi-hi-80 | Multi-hi-95 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 139848.234375 | 116559.250000 | 124620.390625 | 155076.078125 | 163137.218750 | 141089.625000 | 113501.140625 | 123050.484375 | 159128.781250 | 168678.125000 |
1 | 1 | 2017-09-20 19:00:00 | 140181.328125 | 107245.734375 | 118645.898438 | 161716.750000 | 173116.906250 | 142664.000000 | 103333.265625 | 116947.015625 | 168380.984375 | 181994.718750 |
2 | 1 | 2017-09-20 20:00:00 | 140514.406250 | 100175.375000 | 114138.132812 | 166890.687500 | 180853.453125 | 144238.359375 | 95679.804688 | 112487.625000 | 175989.093750 | 192796.921875 |
… | … | … | … | … | … | … | … | … | … | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 148841.671875 | 25453.445312 | 68162.445312 | 229520.890625 | 272229.875000 | 183597.453125 | 4082.392090 | 66218.867188 | 300976.031250 | 363112.531250 |
28 | 1 | 2017-09-21 22:00:00 | 149174.750000 | 23596.246094 | 67063.382812 | 231286.125000 | 274753.250000 | 185171.812500 | 1151.084961 | 64847.128906 | 305496.500000 | 369192.562500 |
29 | 1 | 2017-09-21 23:00:00 | 149507.843750 | 21776.173828 | 65988.593750 | 233027.093750 | 277239.531250 | 186746.187500 | -1776.010254 | 63478.144531 | 310014.218750 | 375268.375000 |
(n_windows=)
, forecasting every second months
(step_size=12)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 30 hours ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
series identifier.ds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows
.y:
true valuemodel:
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | Add | Multi | |
---|---|---|---|---|---|---|
0 | 1 | 2017-09-18 06:00:00 | 2017-09-18 05:00:00 | 99440.0 | 111573.328125 | 112874.039062 |
1 | 1 | 2017-09-18 07:00:00 | 2017-09-18 05:00:00 | 97655.0 | 111820.390625 | 114421.679688 |
2 | 1 | 2017-09-18 08:00:00 | 2017-09-18 05:00:00 | 97655.0 | 112067.453125 | 115969.320312 |
… | … | … | … | … | … | … |
87 | 1 | 2017-09-21 21:00:00 | 2017-09-20 17:00:00 | 103080.0 | 148841.671875 | 183597.453125 |
88 | 1 | 2017-09-21 22:00:00 | 2017-09-20 17:00:00 | 95155.0 | 149174.750000 | 185171.812500 |
89 | 1 | 2017-09-21 23:00:00 | 2017-09-20 17:00:00 | 80285.0 | 149507.843750 | 186746.187500 |
unique_id | metric | Add | Multi | |
---|---|---|---|---|
0 | 1 | mae | 30905.751042 | 48210.098958 |
1 | 1 | mape | 0.336201 | 0.491980 |
2 | 1 | mase | 3.818464 | 5.956449 |
3 | 1 | rmse | 38929.522482 | 54653.132768 |
4 | 1 | smape | 0.129755 | 0.182024 |