The MLForecast class allows you to compute lag transformations on your target, however, sometimes you want to also compute transformations on your dynamic exogenous features. This guide shows you how to accomplish that.

Data setup

from mlforecast.utils import generate_series, generate_prices_for_series
series = generate_series(10, equal_ends=True)
prices = generate_prices_for_series(series)
prices.head(2)
dsunique_idprice
02000-10-0500.548814
12000-10-0600.715189

Suppose that you have some series along with their prices for each id and date and you want to compute forecasts for the next 7 days. Since the price is a dynamic feature you have to provide the future values through X_df in MLForecast.predict.

If you want to use not only the price but the lag7 of the price and the expanding mean of the lag1 for example, you can compute them before training, merge them with your series and then provide the future values through X_df. Consider the following example.

Computing the transformations

from mlforecast.lag_transforms import ExpandingMean

from mlforecast.feature_engineering import transform_exog
transformed_prices = transform_exog(prices, lags=[7], lag_transforms={1: [ExpandingMean()]})
transformed_prices.head(10)
dsunique_idpriceprice_lag7price_expanding_mean_lag1
02000-10-0500.548814NaNNaN
12000-10-0600.715189NaN0.548814
22000-10-0700.602763NaN0.632001
32000-10-0800.544883NaN0.622255
42000-10-0900.423655NaN0.602912
52000-10-1000.645894NaN0.567061
62000-10-1100.437587NaN0.580200
72000-10-1200.8917730.5488140.559827
82000-10-1300.9636630.7151890.601320
92000-10-1400.3834420.6027630.641580

You can now merge this with your original series

series_with_prices = series.merge(transformed_prices, on=['unique_id', 'ds'])
series_with_prices.head(10)
unique_iddsypriceprice_lag7price_expanding_mean_lag1
002000-10-050.3229470.548814NaNNaN
102000-10-061.2187940.715189NaN0.548814
202000-10-072.4458870.602763NaN0.632001
302000-10-083.4818310.544883NaN0.622255
402000-10-094.1917210.423655NaN0.602912
502000-10-105.3958630.645894NaN0.567061
602000-10-116.2644470.437587NaN0.580200
702000-10-120.2840220.8917730.5488140.559827
802000-10-131.4627980.9636630.7151890.601320
902000-10-142.0355180.3834420.6027630.641580

You can then define your forecast object. Note that you can still compute lag features based on the target as you normally would.

from sklearn.linear_model import LinearRegression

from mlforecast import MLForecast
fcst = MLForecast(
    models=[LinearRegression()],
    freq='D',
    lags=[1],
    date_features=['dayofweek'],
)
fcst.preprocess(series_with_prices, dropna=True).head()
unique_iddsypriceprice_lag7price_expanding_mean_lag1lag1dayofweek
102000-10-061.2187940.715189NaN0.5488140.3229474
202000-10-072.4458870.602763NaN0.6320011.2187945
302000-10-083.4818310.544883NaN0.6222552.4458876
402000-10-094.1917210.423655NaN0.6029123.4818310
502000-10-105.3958630.645894NaN0.5670614.1917211

It’s important to note that the dropna argument only considers the null values generated by the lag features based on the target. If you want to drop all rows containing null values you have to do that in your original series.

series_with_prices2 = series_with_prices.dropna()
fcst.preprocess(series_with_prices2, dropna=True, static_features=[]).head()
unique_iddsypriceprice_lag7price_expanding_mean_lag1lag1dayofweek
802000-10-131.4627980.9636630.7151890.6013200.2840224
902000-10-142.0355180.3834420.6027630.6415801.4627985
1002000-10-153.0435650.7917250.5448830.6157662.0355186
1102000-10-164.0101090.5288950.4236550.6317633.0435650
1202000-10-175.4163100.5680450.6458940.6231904.0101091

You can now train the model.

fcst.fit(series_with_prices2, static_features=[])
MLForecast(models=[LinearRegression], freq=D, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

And predict using the prices. Note that you can provide the dataframe with the full history and mlforecast will filter the required dates for the forecasting horizon.

fcst.predict(1, X_df=transformed_prices).head()
unique_iddsLinearRegression
002001-05-153.803967
112001-05-153.512489
222001-05-153.170019
332001-05-154.307121
442001-05-153.018758

In this example we have prices for the next 7 days, if you try to forecast a longer horizon you’ll get an error.

from fastcore.test import test_fail
test_fail(lambda: fcst.predict(8, X_df=transformed_prices), contains='Found missing inputs in X_df')