import lightgbm as lgb
import pandas as pd
from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.utils import generate_daily_series, generate_prices_for_series

Data setup

series = generate_daily_series(
    100, equal_ends=True, n_static_features=2
).rename(columns={'static_1': 'product_id'})
series.head()
unique_iddsystatic_0product_id
0id_002000-10-0539.8119837945
1id_002000-10-06103.2740137945
2id_002000-10-07176.5747447945
3id_002000-10-08258.9879007945
4id_002000-10-09344.9404047945

Use existing exogenous features

In mlforecast the required columns are the series identifier, time and target. Any extra columns you have, like static_0 and product_id here are considered to be static and are replicated when constructing the features for the next timestamp. You can disable this by passing static_features to MLForecast.preprocess or MLForecast.fit, which will only keep the columns you define there as static. Keep in mind that all features in your input dataframe will be used for training, so you’ll have to provide the future values of exogenous features to MLForecast.predict through the X_df argument.

Consider the following example. Suppose that we have a prices catalog for each id and date.

prices_catalog = generate_prices_for_series(series)
prices_catalog.head()
dsunique_idprice
02000-10-05id_000.548814
12000-10-06id_000.715189
22000-10-07id_000.602763
32000-10-08id_000.544883
42000-10-09id_000.423655

And that you have already merged these prices into your series dataframe.

series_with_prices = series.merge(prices_catalog, how='left')
series_with_prices.head()
unique_iddsystatic_0product_idprice
0id_002000-10-0539.81198379450.548814
1id_002000-10-06103.27401379450.715189
2id_002000-10-07176.57474479450.602763
3id_002000-10-08258.98790079450.544883
4id_002000-10-09344.94040479450.423655

This dataframe will be passed to MLForecast.fit (or MLForecast.preprocess). However, since the price is dynamic we have to tell that method that only static_0 and product_id are static.

fcst = MLForecast(
    models=lgb.LGBMRegressor(n_jobs=1, random_state=0, verbosity=-1),
    freq='D',
    lags=[7],
    lag_transforms={
        1: [ExpandingMean()],
        7: [RollingMean(window_size=14)],
    },
    date_features=['dayofweek', 'month'],
    num_threads=2,
)
fcst.fit(series_with_prices, static_features=['static_0', 'product_id'])
MLForecast(models=[LGBMRegressor], freq=D, lag_features=['lag7', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size14'], date_features=['dayofweek', 'month'], num_threads=2)

The features used for training are stored in MLForecast.ts.features_order_. As you can see price was used for training.

fcst.ts.features_order_
['static_0',
 'product_id',
 'price',
 'lag7',
 'expanding_mean_lag1',
 'rolling_mean_lag7_window_size14',
 'dayofweek',
 'month']

So in order to update the price in each timestep we just call MLForecast.predict with our forecast horizon and pass the prices catalog through X_df.

preds = fcst.predict(h=7, X_df=prices_catalog)
preds.head()
unique_iddsLGBMRegressor
0id_002001-05-15418.930093
1id_002001-05-16499.487368
2id_002001-05-1720.321885
3id_002001-05-18102.310778
4id_002001-05-19185.340281

Generating exogenous features

Nixtla provides some utilities to generate exogenous features for both training and forecasting such as statsforecast’s mstl_decomposition or the transform_exog function. We also have utilsforecast’s fourier function, which we’ll demonstrate here.

from sklearn.linear_model import LinearRegression
from utilsforecast.feature_engineering import fourier

Suppose you start with some data like the one above where we have a couple of static features.

series.head()
unique_iddsystatic_0product_id
0id_002000-10-0539.8119837945
1id_002000-10-06103.2740137945
2id_002000-10-07176.5747447945
3id_002000-10-08258.9879007945
4id_002000-10-09344.9404047945

Now we’d like to add some fourier terms to model the seasonality. We can do that with the following:

transformed_df, future_df = fourier(series, freq='D', season_length=7, k=2, h=7)

This provides an extended training dataset.

transformed_df.head()
unique_iddsystatic_0product_idsin1_7sin2_7cos1_7cos2_7
0id_002000-10-0539.81198379450.7818320.9749280.623490-0.222521
1id_002000-10-06103.27401379450.974928-0.433884-0.222521-0.900969
2id_002000-10-07176.57474479450.433884-0.781831-0.9009690.623490
3id_002000-10-08258.9879007945-0.4338840.781832-0.9009690.623490
4id_002000-10-09344.9404047945-0.9749280.433884-0.222521-0.900969

Along with the future values of the features.

future_df.head()
unique_iddssin1_7sin2_7cos1_7cos2_7
0id_002001-05-15-0.781828-0.9749300.623494-0.222511
1id_002001-05-160.0000060.0000111.0000001.000000
2id_002001-05-170.7818350.9749250.623485-0.222533
3id_002001-05-180.974927-0.433895-0.222527-0.900963
4id_002001-05-190.433878-0.781823-0.9009720.623500

We can now train using only these features (and the static ones).

fcst2 = MLForecast(models=LinearRegression(), freq='D')
fcst2.fit(transformed_df, static_features=['static_0', 'product_id'])
MLForecast(models=[LinearRegression], freq=D, lag_features=[], date_features=[], num_threads=1)

And provide the future values to the predict method.

fcst2.predict(h=7, X_df=future_df).head()
unique_iddsLinearRegression
0id_002001-05-15275.822342
1id_002001-05-16262.258117
2id_002001-05-17238.195850
3id_002001-05-18240.997814
4id_002001-05-19262.247123