Step-by-step guide on using the SimpleExponentialSmoothing Model
with Statsforecast
.
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
Time | Ads | |
---|---|---|
0 | 2017-09-13T00:00:00 | 80115 |
1 | 2017-09-13T01:00:00 | 79885 |
2 | 2017-09-13T02:00:00 | 89325 |
3 | 2017-09-13T03:00:00 | 101930 |
4 | 2017-09-13T04:00:00 | 121630 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2017-09-13T00:00:00 | 80115 | 1 |
1 | 2017-09-13T01:00:00 | 79885 | 1 |
2 | 2017-09-13T02:00:00 | 89325 | 1 |
… | … | … | … |
213 | 2017-09-21T21:00:00 | 103080 | 1 |
214 | 2017-09-21T22:00:00 | 95155 | 1 |
215 | 2017-09-21T23:00:00 | 80285 | 1 |
Simple Exponential Smoothing (SES)
.freq:
a string indicating the frequency of the data. (See panda’s
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
Simple Exponential Smoothing model (SES)
. We can observe it with the
following instruction:
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
fitted01 | fitted05 | fitted08 | ds | |
---|---|---|---|---|
0 | NaN | NaN | NaN | 2017-09-13 00:00:00 |
1 | 80115.000000 | 80115.00 | 80115.000000 | 2017-09-13 01:00:00 |
2 | 80092.000000 | 80000.00 | 79931.000000 | 2017-09-13 02:00:00 |
… | … | … | … | … |
183 | 120765.039062 | 139195.00 | 141302.828125 | 2017-09-20 15:00:00 |
184 | 122847.531250 | 140392.50 | 141532.562500 | 2017-09-20 16:00:00 |
185 | 124623.781250 | 140501.25 | 140794.515625 | 2017-09-20 17:00:00 |
StatsForecast.forecast
method instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 30 hours ahead.unique_id | ds | SES01 | SES05 | SES08 | |
---|---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
1 | 1 | 2017-09-20 19:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
2 | 1 | 2017-09-20 20:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
3 | 1 | 2017-09-20 21:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
4 | 1 | 2017-09-20 22:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
unique_id | ds | y | SES01 | SES05 | SES08 | |
---|---|---|---|---|---|---|
0 | 1 | 2017-09-13 00:00:00 | 80115.0 | NaN | NaN | NaN |
1 | 1 | 2017-09-13 01:00:00 | 79885.0 | 80115.000000 | 80115.00 | 80115.000000 |
2 | 1 | 2017-09-13 02:00:00 | 89325.0 | 80092.000000 | 80000.00 | 79931.000000 |
3 | 1 | 2017-09-13 03:00:00 | 101930.0 | 81015.296875 | 84662.50 | 87446.203125 |
4 | 1 | 2017-09-13 04:00:00 | 121630.0 | 83106.773438 | 93296.25 | 99033.242188 |
h
(for
horizon). * h (int):
represents the forecast steps into the
future. In this case, 30 hours ahead.
The forecast object here is a new data frame that includes a column with
the name of the model and the y hat
values, as well as columns for the
uncertainty intervals.
This step should take less than 1 second.
unique_id | ds | SES01 | SES05 | SES08 | |
---|---|---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
1 | 1 | 2017-09-20 19:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
2 | 1 | 2017-09-20 20:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
… | … | … | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
28 | 1 | 2017-09-21 22:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
29 | 1 | 2017-09-21 23:00:00 | 126112.898438 | 140008.125 | 139770.90625 |
(n_windows=)
, forecasting every second months
(step_size=30)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 30 hours ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
series identifierds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows
.y:
true valuemodel:
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | SES01 | SES05 | SES08 | |
---|---|---|---|---|---|---|---|
0 | 1 | 2017-09-18 06:00:00 | 2017-09-18 05:00:00 | 99440.0 | 118499.953125 | 109816.250 | 112747.695312 |
1 | 1 | 2017-09-18 07:00:00 | 2017-09-18 05:00:00 | 97655.0 | 118499.953125 | 109816.250 | 112747.695312 |
2 | 1 | 2017-09-18 08:00:00 | 2017-09-18 05:00:00 | 97655.0 | 118499.953125 | 109816.250 | 112747.695312 |
… | … | … | … | … | … | … | … |
87 | 1 | 2017-09-21 21:00:00 | 2017-09-20 17:00:00 | 103080.0 | 126112.898438 | 140008.125 | 139770.906250 |
88 | 1 | 2017-09-21 22:00:00 | 2017-09-20 17:00:00 | 95155.0 | 126112.898438 | 140008.125 | 139770.906250 |
89 | 1 | 2017-09-21 23:00:00 | 2017-09-20 17:00:00 | 80285.0 | 126112.898438 | 140008.125 | 139770.906250 |
unique_id | metric | SES01 | SES05 | SES08 | |
---|---|---|---|---|---|
0 | 1 | mae | 25173.939583 | 29390.875000 | 29311.802083 |
1 | 1 | mape | 0.255088 | 0.316440 | 0.315339 |
2 | 1 | mase | 3.110288 | 3.631298 | 3.621528 |
3 | 1 | rmse | 28923.395381 | 36184.340869 | 36027.710540 |
4 | 1 | smape | 0.109972 | 0.124803 | 0.124542 |