Step-by-step guide on using the SeasonalExponentialSmoothing Model
with Statsforecast
.
Simple Exponential Smoothing (SES Seasonally Adjusted) model
,
different methods can be used, depending on the nature of the data and
the objective of the analysis.
Here are some common methods to determine the value of the seasonal
parameter :
accuracy
of the seasonally adjusted SES model
predictions. Therefore, it is recommended to test different values of
and evaluate the performance of the model using appropriate
evaluation measures before selecting the final value of .
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
Time | Ads | |
---|---|---|
0 | 2017-09-13T00:00:00 | 80115 |
1 | 2017-09-13T01:00:00 | 79885 |
2 | 2017-09-13T02:00:00 | 89325 |
3 | 2017-09-13T03:00:00 | 101930 |
4 | 2017-09-13T04:00:00 | 121630 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2017-09-13T00:00:00 | 80115 | 1 |
1 | 2017-09-13T01:00:00 | 79885 | 1 |
2 | 2017-09-13T02:00:00 | 89325 | 1 |
3 | 2017-09-13T03:00:00 | 101930 | 1 |
4 | 2017-09-13T04:00:00 | 121630 | 1 |
(ds)
is in an object format, we need
to convert to a date format
Seasonal Exponential Smoothing Model
.season_length
.
freq:
a string indicating the frequency of the data. (See pandas’
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
Seasonal Exponential Smoothing Model
. We
can observe it with the following instruction:
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
fitted | ds | |
---|---|---|
0 | NaN | 2017-09-13 00:00:00 |
1 | NaN | 2017-09-13 01:00:00 |
2 | NaN | 2017-09-13 02:00:00 |
… | … | … |
183 | 145443.734375 | 2017-09-20 15:00:00 |
184 | 150638.000000 | 2017-09-20 16:00:00 |
185 | 155233.093750 | 2017-09-20 17:00:00 |
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 30 hours ahead.unique_id | ds | SeasonalES | |
---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 161567.593750 |
1 | 1 | 2017-09-20 19:00:00 | 163186.562500 |
2 | 1 | 2017-09-20 20:00:00 | 134410.937500 |
… | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 106145.601562 |
28 | 1 | 2017-09-21 22:00:00 | 93383.164062 |
29 | 1 | 2017-09-21 23:00:00 | 79489.718750 |
unique_id | ds | y | SeasonalES | |
---|---|---|---|---|
0 | 1 | 2017-09-13 00:00:00 | 80115.0 | NaN |
1 | 1 | 2017-09-13 01:00:00 | 79885.0 | NaN |
2 | 1 | 2017-09-13 02:00:00 | 89325.0 | NaN |
3 | 1 | 2017-09-13 03:00:00 | 101930.0 | NaN |
4 | 1 | 2017-09-13 04:00:00 | 121630.0 | NaN |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 30 hourly ahead.unique_id | ds | SeasonalES | |
---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 161567.593750 |
1 | 1 | 2017-09-20 19:00:00 | 163186.562500 |
2 | 1 | 2017-09-20 20:00:00 | 134410.937500 |
… | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 106145.601562 |
28 | 1 | 2017-09-21 22:00:00 | 93383.164062 |
29 | 1 | 2017-09-21 23:00:00 | 79489.718750 |
(n_windows=5)
, forecasting every second months
(step_size=12)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 30 hourly ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
series identifier.ds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows.y:
true value"model":
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | SeasonalES | |
---|---|---|---|---|---|
0 | 1 | 2017-09-19 18:00:00 | 2017-09-19 17:00:00 | 161385.0 | 162297.953125 |
1 | 1 | 2017-09-19 19:00:00 | 2017-09-19 17:00:00 | 165010.0 | 155892.812500 |
2 | 1 | 2017-09-19 20:00:00 | 2017-09-19 17:00:00 | 134090.0 | 135694.703125 |
… | … | … | … | … | … |
87 | 1 | 2017-09-21 21:00:00 | 2017-09-20 17:00:00 | 103080.0 | 106145.601562 |
88 | 1 | 2017-09-21 22:00:00 | 2017-09-20 17:00:00 | 95155.0 | 93383.164062 |
89 | 1 | 2017-09-21 23:00:00 | 2017-09-20 17:00:00 | 80285.0 | 79489.718750 |
unique_id | metric | SeasonalES | |
---|---|---|---|
0 | 1 | mae | 5728.207812 |
1 | 1 | mape | 0.049386 |
2 | 1 | mase | 0.707731 |
3 | 1 | rmse | 7290.840738 |
4 | 1 | smape | 0.024009 |