In this example we will show how to perform electricity load forecasting on the ERCOT (Texas) market for detecting daily peaks.
MSTL
(Multiple Seasonal-Trend
decomposition using LOESS) model on historic load data to forecast
day-ahead peaks on September 2022. Multiple seasonality is traditionally
present in low sampled electricity data. Demand exhibits daily and
weekly seasonality, with clear patterns for specific hours of the day
such as 6:00pm vs 3:00am or for specific days such as Sunday vs Friday.
First, we will load ERCOT historic demand, then we will use the
StatsForecast.cross_validation
method to fit the MSTL model and
forecast daily load during September. Finally, we show how to use the
forecasts to detect the coincident peak.
Outline
Tip You can use Colab to run this Notebook interactively![]()
pip install statsforecast
unique_id
, ds
and y
:
unique_id
(string, int or category) represents an identifier
for the series.
ds
(datestamp or int) column should be either an integer
indexing time or a datestamp ideally like YYYY-MM-DD for a date or
YYYY-MM-DD HH:MM:SS for a timestamp.
y
(numeric) represents the measurement we wish to forecast.
plot
method from the StatsForecast
class.
This method prints up to 8 random series from the dataset and is useful
for basic EDA.
Note TheStatsForecast.plot
method uses Plotly as a default engine. You can change to MatPlotLib by settingengine="matplotlib"
.
6,552
observations, so it is necessary to use
computationally efficient methods to deploy them in production.
Tip Check our detailed explanation and tutorial on MSTL hereImport the
StatsForecast
class and the models you need.
[24, 24 * 7]
as the
seasonalities. See this
link for a
detailed explanation on how to set seasonal lengths. In this example we
use the SklearnModel
with a LinearRegression
model for the trend
component, however, any StatsForecast model can be used. The complete
list of models is available
here.
unique_id | ds | y | trend | |
---|---|---|---|---|
0 | ERCOT | 2021-01-01 00:00:00 | 43719.849616 | 1.0 |
1 | ERCOT | 2021-01-01 01:00:00 | 43321.050347 | 2.0 |
2 | ERCOT | 2021-01-01 02:00:00 | 43063.067063 | 3.0 |
3 | ERCOT | 2021-01-01 03:00:00 | 43090.059203 | 4.0 |
4 | ERCOT | 2021-01-01 04:00:00 | 43486.590073 | 5.0 |
StatsForecast
object with the
following required parameters:
models
: a list of models. Select the models you want from
models and import them.
freq
: a string indicating the frequency of the data. (See panda’s
available
frequencies.)
Tip StatsForecast also supports this optional parameter.The
n_jobs
: n_jobs: int, number of jobs used in the parallel processing, use -1 for all cores. (Default: 1)fallback_model
: a model to be used if a model fails. (Default: none)
cross_validation
method allows the user to simulate multiple
historic forecasts, greatly simplifying pipelines by replacing for loops
with fit
and predict
methods. This method re-trains the model and
forecast each window. See this
tutorial
for an animation of how the windows are defined.
Use the cross_validation
method to produce all the daily forecasts for
September. To produce daily forecasts set the forecasting horizon h
as
24. In this example we are simulating deploying the pipeline during
September, so set the number of windows as 30 (one for each day).
Finally, set the step size between windows as 24, to only produce one
forecast per day.
unique_id | ds | cutoff | y | MSTL | |
---|---|---|---|---|---|
0 | ERCOT | 2022-09-01 00:00:00 | 2022-08-31 23:00:00 | 45482.471757 | 47413.944185 |
1 | ERCOT | 2022-09-01 01:00:00 | 2022-08-31 23:00:00 | 43602.658043 | 45237.153285 |
2 | ERCOT | 2022-09-01 02:00:00 | 2022-08-31 23:00:00 | 42284.817342 | 43816.390019 |
3 | ERCOT | 2022-09-01 03:00:00 | 2022-08-31 23:00:00 | 41663.156771 | 42972.956286 |
4 | ERCOT | 2022-09-01 04:00:00 | 2022-08-31 23:00:00 | 41710.621904 | 42909.899438 |
Important When usingcross_validation
make sure the forecasts are produced at the desired timestamps. Check thecutoff
column which specifices the last timestamp before the forecasting window.
cv_df
to detect the daily hourly
demand peaks. For each day, we set the detected peaks as the highest
forecasts. In this case, we want to predict one peak (npeaks
);
depending on your setting and goals, this parameter might change. For
example, the number of peaks can correspond to how many hours a battery
can be discharged to reduce demand.
Important In this example we only include September. However, MSTL can correctly predict the peaks for the 4 months of 2022. You can try this by increasing thenwindows
parameter ofcross_validation
or filtering theY_df
dataset. The complete run for all months take only 10 minutes.