Transforming exogenous features

The MLForecast class allows you to compute lag transformations on your target, however, sometimes you want to also compute transformations on your dynamic exogenous features. This guide shows you how to accomplish that.

Data setup

from mlforecast.utils import generate_series, generate_prices_for_series

series = generate_series(10, equal_ends=True)
prices = generate_prices_for_series(series)
prices.head(2)

	ds	unique_id	price
0	2000-10-05	0	0.548814
1	2000-10-06	0	0.715189

Suppose that you have some series along with their prices for each id and date and you want to compute forecasts for the next 7 days. Since the price is a dynamic feature you have to provide the future values through X_df in MLForecast.predict. If you want to use not only the price but the lag7 of the price and the expanding mean of the lag1 for example, you can compute them before training, merge them with your series and then provide the future values through X_df. Consider the following example.

Computing the transformations

from mlforecast.lag_transforms import ExpandingMean

from mlforecast.feature_engineering import transform_exog

transformed_prices = transform_exog(prices, lags=[7], lag_transforms={1: [ExpandingMean()]})
transformed_prices.head(10)

	ds	price	price_lag7	price_expanding_mean_lag1
0	2000-10-05	0.548814	NaN	NaN
1	2000-10-06	0.715189	NaN	0.548814
2	2000-10-07	0.602763	NaN	0.632001
3	2000-10-08	0.544883	NaN	0.622255
4	2000-10-09	0.423655	NaN	0.602912
5	2000-10-10	0.645894	NaN	0.567061
6	2000-10-11	0.437587	NaN	0.580200
7	2000-10-12	0.891773	0.548814	0.559827
8	2000-10-13	0.963663	0.715189	0.601320
9	2000-10-14	0.383442	0.602763	0.641580

You can now merge this with your original series

series_with_prices = series.merge(transformed_prices, on=['unique_id', 'ds'])
series_with_prices.head(10)

	ds	y	price	price_lag7	price_expanding_mean_lag1
0	2000-10-05	0.322947	0.548814	NaN	NaN
1	2000-10-06	1.218794	0.715189	NaN	0.548814
2	2000-10-07	2.445887	0.602763	NaN	0.632001
3	2000-10-08	3.481831	0.544883	NaN	0.622255
4	2000-10-09	4.191721	0.423655	NaN	0.602912
5	2000-10-10	5.395863	0.645894	NaN	0.567061
6	2000-10-11	6.264447	0.437587	NaN	0.580200
7	2000-10-12	0.284022	0.891773	0.548814	0.559827
8	2000-10-13	1.462798	0.963663	0.715189	0.601320
9	2000-10-14	2.035518	0.383442	0.602763	0.641580

You can then define your forecast object. Note that you can still compute lag features based on the target as you normally would.

from sklearn.linear_model import LinearRegression

from mlforecast import MLForecast

fcst = MLForecast(
    models=[LinearRegression()],
    freq='D',
    lags=[1],
    date_features=['dayofweek'],
)
fcst.preprocess(series_with_prices, static_features=[], dropna=True).head()

	ds	y	price	price_lag7	price_expanding_mean_lag1	lag1	dayofweek
1	2000-10-06	1.218794	0.715189	NaN	0.548814	0.322947	4
2	2000-10-07	2.445887	0.602763	NaN	0.632001	1.218794	5
3	2000-10-08	3.481831	0.544883	NaN	0.622255	2.445887	6
4	2000-10-09	4.191721	0.423655	NaN	0.602912	3.481831	0
5	2000-10-10	5.395863	0.645894	NaN	0.567061	4.191721	1

It’s important to note that the dropna argument only considers the null values generated by the lag features based on the target. If you want to drop all rows containing null values you have to do that in your original series.

series_with_prices2 = series_with_prices.dropna()
fcst.preprocess(series_with_prices2, dropna=True, static_features=[]).head()

	ds	y	price	price_lag7	price_expanding_mean_lag1	lag1	dayofweek
8	2000-10-13	1.462798	0.963663	0.715189	0.601320	0.284022	4
9	2000-10-14	2.035518	0.383442	0.602763	0.641580	1.462798	5
10	2000-10-15	3.043565	0.791725	0.544883	0.615766	2.035518	6
11	2000-10-16	4.010109	0.528895	0.423655	0.631763	3.043565	0
12	2000-10-17	5.416310	0.568045	0.645894	0.623190	4.010109	1

You can now train the model.

fcst.fit(series_with_prices2, static_features=[])

MLForecast(models=[LinearRegression], freq=D, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

And predict using the prices. Note that you can provide the dataframe with the full history and mlforecast will filter the required dates for the forecasting horizon.

fcst.predict(1, X_df=transformed_prices).head()

	unique_id	ds	LinearRegression
0	0	2001-05-15	3.803967
1	1	2001-05-15	3.512489
2	2	2001-05-15	3.170019
3	3	2001-05-15	4.307121
4	4	2001-05-15	3.018758

In this example we have prices for the next 7 days, if you try to forecast a longer horizon you’ll get an error.

from fastcore.test import test_fail

test_fail(lambda: fcst.predict(8, X_df=transformed_prices), contains='Found missing inputs in X_df')

Getting Started

How-to guides

Tutorials

API Reference

Transforming exogenous features

Data setup

Computing the transformations

Getting Started

How-to guides

Tutorials

API Reference

​Data setup

​Computing the transformations

Data setup

Computing the transformations