mlforecast takes scikit-learn estimators as models, which means you can provide scikit-learn’s pipelines as models in order to further apply transformations to the data before passing it to the model.

Data setup

from mlforecast.utils import generate_daily_series
series = generate_daily_series(5)
series.head()
unique_iddsy
0id_02000-01-010.428973
1id_02000-01-021.423626
2id_02000-01-032.311782
3id_02000-01-043.192191
4id_02000-01-054.148767

Pipelines definition

Suppose that you want to use a linear regression model with the lag1 and the day of the week as features. mlforecast returns the day of the week as a single column, however, that’s not the optimal format for a linear regression model, which benefits more from having indicator columns for each day of the week (removing one to avoid colinearity). We can achieve this by using scikit-learn’s OneHotEncoder and then fitting our linear regression model, which we can do in the following way:

from mlforecast import MLForecast
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder
fcst = MLForecast(
    models=[],
    freq='D',
    lags=[1],
    date_features=['dayofweek']
)
X, y = fcst.preprocess(series, return_X_y=True)
X.head()
lag1dayofweek
10.4289736
21.4236260
32.3117821
43.1921912
54.1487673

This is what will be passed to our model, so we’d like to get the dayofweek column and perform one hot encoding, leaving the lag1 column untouched. We can achieve that with the following:

ohe = ColumnTransformer(
    transformers=[
        ('encoder', OneHotEncoder(drop='first'), ['dayofweek'])
    ],
    remainder='passthrough',
)
X_transformed = ohe.fit_transform(X)
X_transformed.shape
(1096, 7)

We can see that our data now has 7 columns, 1 for the lag plus 6 for the days of the week (we dropped the first one).

ohe.get_feature_names_out()
array(['encoder__dayofweek_1', 'encoder__dayofweek_2',
       'encoder__dayofweek_3', 'encoder__dayofweek_4',
       'encoder__dayofweek_5', 'encoder__dayofweek_6', 'remainder__lag1'],
      dtype=object)

Training

We can now build a pipeline that does this and then passes it to our linear regression model.

model = make_pipeline(ohe, LinearRegression())

And provide this as a model to mlforecast

fcst = MLForecast(
    models={'ohe_lr': model},
    freq='D',
    lags=[1],
    date_features=['dayofweek']
)
fcst.fit(series)
MLForecast(models=[ohe_lr], freq=<Day>, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

Forecasting

Finally, we compute the forecasts.

fcst.predict(1)
unique_iddsohe_lr
0id_02000-08-104.312748
1id_12000-04-074.537019
2id_22000-06-164.160505
3id_32000-08-303.777040
4id_42001-01-082.676933

Summary

You can provide complex scikit-learn pipelines as models to mlforecast, which allows you to perform different transformations depending on the model and use any of scikit-learn’s compatible estimators.