Using scikit-learn pipelines

mlforecast takes scikit-learn estimators as models, which means you can provide scikit-learn’s pipelines as models in order to further apply transformations to the data before passing it to the model.

Data setup

from mlforecast.utils import generate_daily_series

series = generate_daily_series(5)
series.head()

	unique_id	ds	y
0	id_0	2000-01-01	0.428973
1	id_0	2000-01-02	1.423626
2	id_0	2000-01-03	2.311782
3	id_0	2000-01-04	3.192191
4	id_0	2000-01-05	4.148767

Pipelines definition

Suppose that you want to use a linear regression model with the lag1 and the day of the week as features. mlforecast returns the day of the week as a single column, however, that’s not the optimal format for a linear regression model, which benefits more from having indicator columns for each day of the week (removing one to avoid colinearity). We can achieve this by using scikit-learn’s OneHotEncoder and then fitting our linear regression model, which we can do in the following way:

from mlforecast import MLForecast
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder

fcst = MLForecast(
    models=[],
    freq='D',
    lags=[1],
    date_features=['dayofweek']
)
X, y = fcst.preprocess(series, return_X_y=True)
X.head()

	lag1	dayofweek
1	0.428973	6
2	1.423626	0
3	2.311782	1
4	3.192191	2
5	4.148767	3

This is what will be passed to our model, so we’d like to get the dayofweek column and perform one hot encoding, leaving the lag1 column untouched. We can achieve that with the following:

ohe = ColumnTransformer(
    transformers=[
        ('encoder', OneHotEncoder(drop='first'), ['dayofweek'])
    ],
    remainder='passthrough',
)
X_transformed = ohe.fit_transform(X)
X_transformed.shape

(1096, 7)

We can see that our data now has 7 columns, 1 for the lag plus 6 for the days of the week (we dropped the first one).

ohe.get_feature_names_out()

array(['encoder__dayofweek_1', 'encoder__dayofweek_2',
       'encoder__dayofweek_3', 'encoder__dayofweek_4',
       'encoder__dayofweek_5', 'encoder__dayofweek_6', 'remainder__lag1'],
      dtype=object)

Training

We can now build a pipeline that does this and then passes it to our linear regression model.

model = make_pipeline(ohe, LinearRegression())

And provide this as a model to mlforecast

fcst = MLForecast(
    models={'ohe_lr': model},
    freq='D',
    lags=[1],
    date_features=['dayofweek']
)
fcst.fit(series)

MLForecast(models=[ohe_lr], freq=<Day>, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

Forecasting

Finally, we compute the forecasts.

fcst.predict(1)

	unique_id	ds	ohe_lr
0	id_0	2000-08-10	4.312748
1	id_1	2000-04-07	4.537019
2	id_2	2000-06-16	4.160505
3	id_3	2000-08-30	3.777040
4	id_4	2001-01-08	2.676933

Summary

You can provide complex scikit-learn pipelines as models to mlforecast, which allows you to perform different transformations depending on the model and use any of scikit-learn’s compatible estimators.

Getting Started

How-to guides

Tutorials

API Reference

Using scikit-learn pipelines

Data setup

Pipelines definition

Training

Forecasting

Summary

Getting Started

How-to guides

Tutorials

API Reference

​Data setup

​Pipelines definition

​Training

​Forecasting

​Summary

Data setup

Pipelines definition

Training

Forecasting

Summary