Step-by-step guide on using theDuring this walkthrough, we will become familiar with the mainSeasonalExponentialSmoothingOptimized Model
withStatsforecast
.
StatsForecast
class and some relevant methods such as
StatsForecast.plot
, StatsForecast.forecast
and
StatsForecast.cross_validation
in other.
The text in this article is largely taken from:
Tip Statsforecast will be needed. To install, see instructions.Next, we import plotting libraries and configure the plotting style.
Time | Ads | |
---|---|---|
0 | 2017-09-13T00:00:00 | 80115 |
1 | 2017-09-13T01:00:00 | 79885 |
2 | 2017-09-13T02:00:00 | 89325 |
3 | 2017-09-13T03:00:00 | 101930 |
4 | 2017-09-13T04:00:00 | 121630 |
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp) column should be of a format expected by
Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a
timestamp.
y
(numeric) represents the measurement we wish to forecast.
ds | y | unique_id | |
---|---|---|---|
0 | 2017-09-13T00:00:00 | 80115 | 1 |
1 | 2017-09-13T01:00:00 | 79885 | 1 |
2 | 2017-09-13T02:00:00 | 89325 | 1 |
3 | 2017-09-13T03:00:00 | 101930 | 1 |
4 | 2017-09-13T04:00:00 | 121630 | 1 |
(ds)
is in an object format, we need
to convert to a date format
Seasonal Exponential Smoothing Optimized Model
.season_length
.
freq:
a string indicating the frequency of the data. (See panda’s
available
frequencies.)
n_jobs:
n_jobs: int, number of jobs used in the parallel
processing, use -1 for all cores.
fallback_model:
a model to be used if a model fails.
Seasonal Exponential Smoothing Optimized Model
. We can observe it with
the following instruction:
.get()
function to extract the element and then we are going to save
it in a pd.DataFrame()
.
fitted | ds | |
---|---|---|
0 | NaN | 2017-09-13 00:00:00 |
1 | NaN | 2017-09-13 01:00:00 |
2 | NaN | 2017-09-13 02:00:00 |
… | … | … |
183 | 148833.171875 | 2017-09-20 15:00:00 |
184 | 149860.031250 | 2017-09-20 16:00:00 |
185 | 150673.375000 | 2017-09-20 17:00:00 |
StatsForecast.forecast
method
instead of .fit
and .predict
.
The main difference is that the .forecast
doest not store the fitted
values and is highly scalable in distributed environments.
The forecast method takes two arguments: forecasts next h
(horizon)
and level
.
h (int):
represents the forecast h steps into the future. In this
case, 12 months ahead.unique_id | ds | SeasESOpt | |
---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 161532.046875 |
1 | 1 | 2017-09-20 19:00:00 | 161051.687500 |
2 | 1 | 2017-09-20 20:00:00 | 135531.640625 |
… | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 105600.390625 |
28 | 1 | 2017-09-21 22:00:00 | 96717.390625 |
29 | 1 | 2017-09-21 23:00:00 | 82608.343750 |
unique_id | ds | y | SeasESOpt | |
---|---|---|---|---|
0 | 1 | 2017-09-13 00:00:00 | 80115.0 | NaN |
1 | 1 | 2017-09-13 01:00:00 | 79885.0 | NaN |
2 | 1 | 2017-09-13 02:00:00 | 89325.0 | NaN |
3 | 1 | 2017-09-13 03:00:00 | 101930.0 | NaN |
4 | 1 | 2017-09-13 04:00:00 | 121630.0 | NaN |
h
(for
horizon) and level
.
h (int):
represents the forecast h steps into the future. In this
case, 30 hours ahead.unique_id | ds | SeasESOpt | |
---|---|---|---|
0 | 1 | 2017-09-20 18:00:00 | 161532.046875 |
1 | 1 | 2017-09-20 19:00:00 | 161051.687500 |
2 | 1 | 2017-09-20 20:00:00 | 135531.640625 |
… | … | … | … |
27 | 1 | 2017-09-21 21:00:00 | 105600.390625 |
28 | 1 | 2017-09-21 22:00:00 | 96717.390625 |
29 | 1 | 2017-09-21 23:00:00 | 82608.343750 |
(n_windows=)
, forecasting every second months
(step_size=12)
. Depending on your computer, this step should take
around 1 min.
The cross_validation method from the StatsForecast class takes the
following arguments.
df:
training data frame
h (int):
represents h steps into the future that are being
forecasted. In this case, 12 months ahead.
step_size (int):
step size between each window. In other words:
how often do you want to run the forecasting processes.
n_windows(int):
number of windows used for cross validation. In
other words: what number of forecasting processes in the past do you
want to evaluate.
unique_id:
series identifier.ds:
datestamp or temporal indexcutoff:
the last datestamp or temporal index for the n_windows
.y:
true valuemodel:
columns with the model’s name and fitted value.unique_id | ds | cutoff | y | SeasESOpt | |
---|---|---|---|---|---|
0 | 1 | 2017-09-18 06:00:00 | 2017-09-18 05:00:00 | 99440.0 | 141401.750000 |
1 | 1 | 2017-09-18 07:00:00 | 2017-09-18 05:00:00 | 97655.0 | 152474.250000 |
2 | 1 | 2017-09-18 08:00:00 | 2017-09-18 05:00:00 | 97655.0 | 152482.796875 |
… | … | … | … | … | … |
87 | 1 | 2017-09-21 21:00:00 | 2017-09-20 17:00:00 | 103080.0 | 105600.390625 |
88 | 1 | 2017-09-21 22:00:00 | 2017-09-20 17:00:00 | 95155.0 | 96717.390625 |
89 | 1 | 2017-09-21 23:00:00 | 2017-09-20 17:00:00 | 80285.0 | 82608.343750 |
unique_id | metric | SeasESOpt | |
---|---|---|---|
0 | 1 | mae | 6694.042188 |
1 | 1 | mape | 0.060392 |
2 | 1 | mase | 0.827062 |
3 | 1 | rmse | 8118.297509 |
4 | 1 | smape | 0.028961 |