Exogenous features
Use exogenous regressors for training and predicting
Data setup
unique_id | ds | y | static_0 | product_id | |
---|---|---|---|---|---|
0 | id_00 | 2000-10-05 | 39.811983 | 79 | 45 |
1 | id_00 | 2000-10-06 | 103.274013 | 79 | 45 |
2 | id_00 | 2000-10-07 | 176.574744 | 79 | 45 |
3 | id_00 | 2000-10-08 | 258.987900 | 79 | 45 |
4 | id_00 | 2000-10-09 | 344.940404 | 79 | 45 |
Use existing exogenous features
In mlforecast the required columns are the series identifier, time and
target. Any extra columns you have, like static_0
and product_id
here are considered to be static and are replicated when constructing
the features for the next timestamp. You can disable this by passing
static_features
to
MLForecast.preprocess
or
MLForecast.fit
,
which will only keep the columns you define there as static. Keep in
mind that all features in your input dataframe will be used for
training, so you’ll have to provide the future values of exogenous
features to
MLForecast.predict
through the X_df
argument.
Consider the following example. Suppose that we have a prices catalog for each id and date.
ds | unique_id | price | |
---|---|---|---|
0 | 2000-10-05 | id_00 | 0.548814 |
1 | 2000-10-06 | id_00 | 0.715189 |
2 | 2000-10-07 | id_00 | 0.602763 |
3 | 2000-10-08 | id_00 | 0.544883 |
4 | 2000-10-09 | id_00 | 0.423655 |
And that you have already merged these prices into your series dataframe.
unique_id | ds | y | static_0 | product_id | price | |
---|---|---|---|---|---|---|
0 | id_00 | 2000-10-05 | 39.811983 | 79 | 45 | 0.548814 |
1 | id_00 | 2000-10-06 | 103.274013 | 79 | 45 | 0.715189 |
2 | id_00 | 2000-10-07 | 176.574744 | 79 | 45 | 0.602763 |
3 | id_00 | 2000-10-08 | 258.987900 | 79 | 45 | 0.544883 |
4 | id_00 | 2000-10-09 | 344.940404 | 79 | 45 | 0.423655 |
This dataframe will be passed to
MLForecast.fit
(or
MLForecast.preprocess
).
However, since the price is dynamic we have to tell that method that
only static_0
and product_id
are static.
The features used for training are stored in
MLForecast.ts.features_order_
. As you can see price
was used for
training.
So in order to update the price in each timestep we just call
MLForecast.predict
with our forecast horizon and pass the prices catalog through X_df
.
unique_id | ds | LGBMRegressor | |
---|---|---|---|
0 | id_00 | 2001-05-15 | 418.930093 |
1 | id_00 | 2001-05-16 | 499.487368 |
2 | id_00 | 2001-05-17 | 20.321885 |
3 | id_00 | 2001-05-18 | 102.310778 |
4 | id_00 | 2001-05-19 | 185.340281 |
Generating exogenous features
Nixtla provides some utilities to generate exogenous features for both training and forecasting such as statsforecast’s mstl_decomposition or the transform_exog function. We also have utilsforecast’s fourier function, which we’ll demonstrate here.
Suppose you start with some data like the one above where we have a couple of static features.
unique_id | ds | y | static_0 | product_id | |
---|---|---|---|---|---|
0 | id_00 | 2000-10-05 | 39.811983 | 79 | 45 |
1 | id_00 | 2000-10-06 | 103.274013 | 79 | 45 |
2 | id_00 | 2000-10-07 | 176.574744 | 79 | 45 |
3 | id_00 | 2000-10-08 | 258.987900 | 79 | 45 |
4 | id_00 | 2000-10-09 | 344.940404 | 79 | 45 |
Now we’d like to add some fourier terms to model the seasonality. We can do that with the following:
This provides an extended training dataset.
unique_id | ds | y | static_0 | product_id | sin1_7 | sin2_7 | cos1_7 | cos2_7 | |
---|---|---|---|---|---|---|---|---|---|
0 | id_00 | 2000-10-05 | 39.811983 | 79 | 45 | 0.781832 | 0.974928 | 0.623490 | -0.222521 |
1 | id_00 | 2000-10-06 | 103.274013 | 79 | 45 | 0.974928 | -0.433884 | -0.222521 | -0.900969 |
2 | id_00 | 2000-10-07 | 176.574744 | 79 | 45 | 0.433884 | -0.781831 | -0.900969 | 0.623490 |
3 | id_00 | 2000-10-08 | 258.987900 | 79 | 45 | -0.433884 | 0.781832 | -0.900969 | 0.623490 |
4 | id_00 | 2000-10-09 | 344.940404 | 79 | 45 | -0.974928 | 0.433884 | -0.222521 | -0.900969 |
Along with the future values of the features.
unique_id | ds | sin1_7 | sin2_7 | cos1_7 | cos2_7 | |
---|---|---|---|---|---|---|
0 | id_00 | 2001-05-15 | -0.781828 | -0.974930 | 0.623494 | -0.222511 |
1 | id_00 | 2001-05-16 | 0.000006 | 0.000011 | 1.000000 | 1.000000 |
2 | id_00 | 2001-05-17 | 0.781835 | 0.974925 | 0.623485 | -0.222533 |
3 | id_00 | 2001-05-18 | 0.974927 | -0.433895 | -0.222527 | -0.900963 |
4 | id_00 | 2001-05-19 | 0.433878 | -0.781823 | -0.900972 | 0.623500 |
We can now train using only these features (and the static ones).
And provide the future values to the predict method.
unique_id | ds | LinearRegression | |
---|---|---|---|
0 | id_00 | 2001-05-15 | 275.822342 |
1 | id_00 | 2001-05-16 | 262.258117 |
2 | id_00 | 2001-05-17 | 238.195850 |
3 | id_00 | 2001-05-18 | 240.997814 |
4 | id_00 | 2001-05-19 | 262.247123 |