Exogenous features

import lightgbm as lgb
import pandas as pd
from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.utils import generate_daily_series, generate_prices_for_series

Data setup

series = generate_daily_series(
    100, equal_ends=True, n_static_features=2
).rename(columns={'static_1': 'product_id'})
series.head()

	unique_id	ds	y	static_0	product_id
0	id_00	2000-10-05	39.811983	79	45
1	id_00	2000-10-06	103.274013	79	45
2	id_00	2000-10-07	176.574744	79	45
3	id_00	2000-10-08	258.987900	79	45
4	id_00	2000-10-09	344.940404	79	45

Use existing exogenous features

In mlforecast the required columns are the series identifier, time and target. Any extra columns you have, like static_0 and product_id here are considered to be static and are replicated when constructing the features for the next timestamp. You can disable this by passing static_features to MLForecast.preprocess or MLForecast.fit, which will only keep the columns you define there as static. Keep in mind that all features in your input dataframe will be used for training, so you’ll have to provide the future values of exogenous features to MLForecast.predict through the X_df argument. Consider the following example. Suppose that we have a prices catalog for each id and date.

prices_catalog = generate_prices_for_series(series)
prices_catalog.head()

	ds	unique_id	price
0	2000-10-05	id_00	0.548814
1	2000-10-06	id_00	0.715189
2	2000-10-07	id_00	0.602763
3	2000-10-08	id_00	0.544883
4	2000-10-09	id_00	0.423655

And that you have already merged these prices into your series dataframe.

series_with_prices = series.merge(prices_catalog, how='left')
series_with_prices.head()

	unique_id	ds	y	static_0	product_id	price
0	id_00	2000-10-05	39.811983	79	45	0.548814
1	id_00	2000-10-06	103.274013	79	45	0.715189
2	id_00	2000-10-07	176.574744	79	45	0.602763
3	id_00	2000-10-08	258.987900	79	45	0.544883
4	id_00	2000-10-09	344.940404	79	45	0.423655

This dataframe will be passed to MLForecast.fit (or MLForecast.preprocess). However, since the price is dynamic we have to tell that method that only static_0 and product_id are static.

fcst = MLForecast(
    models=lgb.LGBMRegressor(n_jobs=1, random_state=0, verbosity=-1),
    freq='D',
    lags=[7],
    lag_transforms={
        1: [ExpandingMean()],
        7: [RollingMean(window_size=14)],
    },
    date_features=['dayofweek', 'month'],
    num_threads=2,
)
fcst.fit(series_with_prices, static_features=['static_0', 'product_id'])

MLForecast(models=[LGBMRegressor], freq=D, lag_features=['lag7', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size14'], date_features=['dayofweek', 'month'], num_threads=2)

The features used for training are stored in MLForecast.ts.features_order_. As you can see price was used for training.

fcst.ts.features_order_

['static_0',
 'product_id',
 'price',
 'lag7',
 'expanding_mean_lag1',
 'rolling_mean_lag7_window_size14',
 'dayofweek',
 'month']

So in order to update the price in each timestep we just call MLForecast.predict with our forecast horizon and pass the prices catalog through X_df.

preds = fcst.predict(h=7, X_df=prices_catalog)
preds.head()

	unique_id	ds	LGBMRegressor
0	id_00	2001-05-15	418.930093
1	id_00	2001-05-16	499.487368
2	id_00	2001-05-17	20.321885
3	id_00	2001-05-18	102.310778
4	id_00	2001-05-19	185.340281

Generating exogenous features

Nixtla provides some utilities to generate exogenous features for both training and forecasting such as statsforecast’s mstl_decomposition or the transform_exog function. We also have utilsforecast’s fourier function, which we’ll demonstrate here.

from sklearn.linear_model import LinearRegression
from utilsforecast.feature_engineering import fourier

Suppose you start with some data like the one above where we have a couple of static features.

series.head()

	unique_id	ds	y	static_0	product_id
0	id_00	2000-10-05	39.811983	79	45
1	id_00	2000-10-06	103.274013	79	45
2	id_00	2000-10-07	176.574744	79	45
3	id_00	2000-10-08	258.987900	79	45
4	id_00	2000-10-09	344.940404	79	45

Now we’d like to add some fourier terms to model the seasonality. We can do that with the following:

transformed_df, future_df = fourier(series, freq='D', season_length=7, k=2, h=7)

This provides an extended training dataset.

transformed_df.head()

	unique_id	ds	y	static_0	product_id	sin1_7	sin2_7	cos1_7	cos2_7
0	id_00	2000-10-05	39.811983	79	45	0.781832	0.974928	0.623490	-0.222521
1	id_00	2000-10-06	103.274013	79	45	0.974928	-0.433884	-0.222521	-0.900969
2	id_00	2000-10-07	176.574744	79	45	0.433884	-0.781831	-0.900969	0.623490
3	id_00	2000-10-08	258.987900	79	45	-0.433884	0.781832	-0.900969	0.623490
4	id_00	2000-10-09	344.940404	79	45	-0.974928	0.433884	-0.222521	-0.900969

Along with the future values of the features.

future_df.head()

	unique_id	ds	sin1_7	sin2_7	cos1_7	cos2_7
0	id_00	2001-05-15	-0.781828	-0.974930	0.623494	-0.222511
1	id_00	2001-05-16	0.000006	0.000011	1.000000	1.000000
2	id_00	2001-05-17	0.781835	0.974925	0.623485	-0.222533
3	id_00	2001-05-18	0.974927	-0.433895	-0.222527	-0.900963
4	id_00	2001-05-19	0.433878	-0.781823	-0.900972	0.623500

We can now train using only these features (and the static ones).

fcst2 = MLForecast(models=LinearRegression(), freq='D')
fcst2.fit(transformed_df, static_features=['static_0', 'product_id'])

MLForecast(models=[LinearRegression], freq=D, lag_features=[], date_features=[], num_threads=1)

And provide the future values to the predict method.

fcst2.predict(h=7, X_df=future_df).head()

	unique_id	ds	LinearRegression
0	id_00	2001-05-15	275.822342
1	id_00	2001-05-16	262.258117
2	id_00	2001-05-17	238.195850
3	id_00	2001-05-18	240.997814
4	id_00	2001-05-19	262.247123

Getting Started

How-to guides

Tutorials

API Reference

Exogenous features

Data setup

Use existing exogenous features

Generating exogenous features

Getting Started

How-to guides

Tutorials

API Reference

​Data setup

​Use existing exogenous features

​Generating exogenous features

Data setup

Use existing exogenous features

Generating exogenous features