Data setup

from mlforecast.utils import generate_daily_series
series = generate_daily_series(10)
series.head()
unique_iddsy
0id_02000-01-010.322947
1id_02000-01-021.218794
2id_02000-01-032.445887
3id_02000-01-043.481831
4id_02000-01-054.191721

Training

Suppose that you want to train a linear regression model using the day of the week and lag1 as features.

from sklearn.linear_model import LinearRegression

from mlforecast import MLForecast
fcst = MLForecast(
    freq='D',
    models={'lr': LinearRegression()},
    lags=[1],
    date_features=['dayofweek'],
)
fcst.fit(series)
MLForecast(models=[lr], freq=<Day>, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

What MLForecast.fit does is save the required data for the predict step and also train the models (in this case the linear regression). The trained models are available in the MLForecast.models_ attribute, which is a dictionary where the keys are the model names and the values are the model themselves.

fcst.models_
{'lr': LinearRegression()}

Inspect parameters

We can access the linear regression coefficients in the following way:

fcst.models_['lr'].intercept_, fcst.models_['lr'].coef_
(3.2476337167384415, array([ 0.19896416, -0.21441331]))

SHAP

import shap

Training set

If you need to generate the training data you can use MLForecast.preprocess.

prep = fcst.preprocess(series)
prep.head()
unique_iddsylag1dayofweek
1id_02000-01-021.2187940.3229476
2id_02000-01-032.4458871.2187940
3id_02000-01-043.4818312.4458871
4id_02000-01-054.1917213.4818312
5id_02000-01-065.3958634.1917213

We extract the X, which involves dropping the info columns (id + times) and the target

X = prep.drop(columns=['unique_id', 'ds', 'y'])
X.head()
lag1dayofweek
10.3229476
21.2187940
32.4458871
43.4818312
54.1917213

We can now compute the shap values

X100 = shap.utils.sample(X, 100)
explainer = shap.Explainer(fcst.models_['lr'].predict, X100)
shap_values = explainer(X)

And visualize them

shap.plots.beeswarm(shap_values)

Predictions

Sometimes you want to determine why the model gave a specific prediction. In order to do this you need the input features, which aren’t returned by default, but you can retrieve them using a callback.

from mlforecast.callbacks import SaveFeatures
save_feats = SaveFeatures()
preds = fcst.predict(1, before_predict_callback=save_feats)
preds.head()
unique_iddslr
0id_02000-08-103.468643
1id_12000-04-073.016877
2id_22000-06-162.815249
3id_32000-08-304.048894
4id_42001-01-083.524532

You can now retrieve the features by using SaveFeatures.get_features

features = save_feats.get_features()
features.head()
lag1dayofweek
04.3437443
13.1507994
22.1374124
36.1824562
41.3916980

And use those features to compute the shap values.

shap_values_predictions = explainer(features)

We can now analyze what influenced the prediction for 'id_4'.

round(preds.loc[4, 'lr'], 3)
3.525
shap.plots.waterfall(shap_values_predictions[4])