Data setup

For this example we’ll use a subset of the M4 hourly dataset. You can find the a notebook with the full dataset here.

import random
import tempfile
from pathlib import Path

import pandas as pd
from datasetsforecast.m4 import M4
from utilsforecast.plotting import plot_series
await M4.async_download('data', group='Hourly')
df, *_ = M4.load('data', 'Hourly')
uids = df['unique_id'].unique()
random.seed(0)
sample_uids = random.choices(uids, k=4)
df = df[df['unique_id'].isin(sample_uids)].reset_index(drop=True)
df['ds'] = df['ds'].astype('int64')
df
unique_iddsy
0H196111.8
1H196211.4
2H196311.1
3H196410.8
4H196510.6
4027H413100499.0
4028H413100588.0
4029H413100647.0
4030H413100741.0
4031H413100834.0

EDA

We’ll take a look at our series to get ideas for transformations and features.

fig = plot_series(df, max_insample_length=24 * 14)

We can use the MLForecast.preprocess method to explore different transformations. It looks like these series have a strong seasonality on the hour of the day, so we can subtract the value from the same hour in the previous day to remove it. This can be done with the mlforecast.target_transforms.Differences transformer, which we pass through target_transforms.

from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
fcst = MLForecast(
    models=[],  # we're not interested in modeling yet
    freq=1,  # our series have integer timestamps, so we'll just add 1 in every timestep
    target_transforms=[Differences([24])],
)
prep = fcst.preprocess(df)
prep
unique_iddsy
24H196250.3
25H196260.3
26H196270.1
27H196280.2
28H196290.2
4027H413100439.0
4028H413100555.0
4029H413100614.0
4030H41310073.0
4031H41310084.0

This has subtacted the lag 24 from each value, we can see what our series look like now.

fig = plot_series(prep)

Adding features

Lags

Looks like the seasonality is gone, we can now try adding some lag features.

fcst = MLForecast(
    models=[],
    freq=1,
    lags=[1, 24],
    target_transforms=[Differences([24])],    
)
prep = fcst.preprocess(df)
prep
unique_iddsylag1lag24
48H196490.10.10.3
49H196500.10.10.3
50H196510.20.10.1
51H196520.10.20.2
52H196530.10.10.2
4027H413100439.029.01.0
4028H413100555.039.0-25.0
4029H413100614.055.0-20.0
4030H41310073.014.00.0
4031H41310084.03.0-16.0
prep.drop(columns=['unique_id', 'ds']).corr()['y']
y        1.000000
lag1     0.622531
lag24   -0.234268
Name: y, dtype: float64

Lag transforms

Lag transforms are defined as a dictionary where the keys are the lags and the values are the transformations that we want to apply to that lag. The lag transformations can be either objects from the mlforecast.lag_transforms module or numba jitted functions (so that computing the features doesn’t become a bottleneck and we can bypass the GIL when using multithreading), we have some implemented in the window-ops package but you can also implement your own.

from mlforecast.lag_transforms import ExpandingMean, RollingMean
from numba import njit
from window_ops.rolling import rolling_mean
@njit
def rolling_mean_48(x):
    return rolling_mean(x, window_size=48)


fcst = MLForecast(
    models=[],
    freq=1,
    target_transforms=[Differences([24])],    
    lag_transforms={
        1: [ExpandingMean()],
        24: [RollingMean(window_size=48), rolling_mean_48],
    },
)
prep = fcst.preprocess(df)
prep
unique_iddsyexpanding_mean_lag1rolling_mean_lag24_window_size48rolling_mean_48_lag24
95H196960.10.1746480.1500000.150000
96H196970.30.1736110.1458330.145833
97H196980.30.1753420.1416670.141667
98H196990.30.1770270.1416670.141667
99H1961000.30.1786670.1416670.141667
4027H413100439.00.2420843.4375003.437500
4028H413100555.00.2816332.7083332.708333
4029H413100614.00.3374112.1250002.125000
4030H41310073.00.3513241.7708331.770833
4031H41310084.00.3540181.2083331.208333

You can see that both approaches get to the same result, you can use whichever one you feel most comfortable with.

Date features

If your time column is made of timestamps then it might make sense to extract features like week, dayofweek, quarter, etc. You can do that by passing a list of strings with pandas time/date components. You can also pass functions that will take the time column as input, as we’ll show here.

def hour_index(times):
    return times % 24

fcst = MLForecast(
    models=[],
    freq=1,
    target_transforms=[Differences([24])],
    date_features=[hour_index],
)
fcst.preprocess(df)
unique_iddsyhour_index
24H196250.31
25H196260.32
26H196270.13
27H196280.24
28H196290.25
4027H413100439.020
4028H413100555.021
4029H413100614.022
4030H41310073.023
4031H41310084.00

Target transformations

If you want to do some transformation to your target before computing the features and then re-apply it after predicting you can use the target_transforms argument, which takes a list of transformations. You can find the implemented ones in mlforecast.target_transforms or you can implement your own as described in the target transformations guide.

from mlforecast.target_transforms import LocalStandardScaler
fcst = MLForecast(
    models=[],
    freq=1,
    lags=[1],
    target_transforms=[LocalStandardScaler()]
)
fcst.preprocess(df)
unique_iddsylag1
1H1962-1.493026-1.383286
2H1963-1.575331-1.493026
3H1964-1.657635-1.575331
4H1965-1.712505-1.657635
5H1966-1.794810-1.712505
4027H41310043.0627662.425012
4028H41310052.5231283.062766
4029H41310060.5117512.523128
4030H41310070.2174030.511751
4031H4131008-0.1260030.217403

We can define a naive model to test this

from sklearn.base import BaseEstimator

class Naive(BaseEstimator):
    def fit(self, X, y):
        return self

    def predict(self, X):
        return X['lag1']
fcst = MLForecast(
    models=[Naive()],
    freq=1,
    lags=[1],
    target_transforms=[LocalStandardScaler()]
)
fcst.fit(df)
preds = fcst.predict(1)
preds
unique_iddsNaive
0H196100916.8
1H256100913.4
2H3811009207.0
3H413100934.0

We compare this with the last values of our serie

last_vals = df.groupby('unique_id').tail(1)
last_vals
unique_iddsy
1007H196100816.8
2015H256100813.4
3023H3811008207.0
4031H413100834.0
import numpy as np
np.testing.assert_allclose(preds['Naive'], last_vals['y'])

Training

Once you’ve decided the features, transformations and models that you want to use you can use the MLForecast.fit method instead, which will do the preprocessing and then train the models. The models can be specified as a list (which will name them by using their class name and an index if there are repeated classes) or as a dictionary where the keys are the names you want to give to the models, i.e. the name of the column that will hold their predictions, and the values are the models themselves.

import lightgbm as lgb
lgb_params = {
    'verbosity': -1,
    'num_leaves': 512,
}

fcst = MLForecast(
    models={
        'avg': lgb.LGBMRegressor(**lgb_params),
        'q75': lgb.LGBMRegressor(**lgb_params, objective='quantile', alpha=0.75),
        'q25': lgb.LGBMRegressor(**lgb_params, objective='quantile', alpha=0.25),
    },
    freq=1,
    target_transforms=[Differences([24])],
    lags=[1, 24],
    lag_transforms={
        1: [ExpandingMean()],
        24: [RollingMean(window_size=48)],
    },
    date_features=[hour_index],
)
fcst.fit(df)
MLForecast(models=[avg, q75, q25], freq=1, lag_features=['lag1', 'lag24', 'expanding_mean_lag1', 'rolling_mean_lag24_window_size48'], date_features=[<function hour_index>], num_threads=1)

This computed the features and trained three different models using them. We can now compute our forecasts.

Forecasting

preds = fcst.predict(48)
preds
unique_iddsavgq75q25
0H196100916.29525716.35714816.315731
1H196101015.91028216.00732215.862261
2H196101115.72836715.78018315.658180
3H196101215.46841415.51359815.399717
4H196101315.08127915.13384815.007694
187H4131052100.450617124.21115047.025017
188H413105388.426800108.30340944.715380
189H413105459.67573781.85996419.239462
190H413105557.58035672.70330121.486674
191H413105642.66987946.01827124.392357
fig = plot_series(df, preds, max_insample_length=24 * 7)

Saving and loading

The MLForecast class has the MLForecast.save and MLForecast.load to store and then load the forecast object.

with tempfile.TemporaryDirectory() as tmpdir:
    save_dir = Path(tmpdir) / 'mlforecast'
    fcst.save(save_dir)
    fcst2 = MLForecast.load(save_dir)
    preds2 = fcst2.predict(48)
    pd.testing.assert_frame_equal(preds, preds2)

Updating series’ values

After you’ve trained a forecast object you can save and load it with the previous methods. If by the time you want to use it you already know the following values of the target you can use the MLForecast.update method to incorporate these, which will allow you to use these new values when computing predictions.

  • If no new values are provided for a serie that’s currently stored, only the previous ones are kept.
  • If new series are included they are added to the existing ones.
fcst = MLForecast(
    models=[Naive()],
    freq=1,
    lags=[1, 2, 3],
)
fcst.fit(df)
fcst.predict(1)
unique_iddsNaive
0H196100916.8
1H256100913.4
2H3811009207.0
3H413100934.0
new_values = pd.DataFrame({
    'unique_id': ['H196', 'H256'],
    'ds': [1009, 1009],
    'y': [17.0, 14.0],
})
fcst.update(new_values)
preds = fcst.predict(1)
preds
unique_iddsNaive
0H196101017.0
1H256101014.0
2H3811009207.0
3H413100934.0

Estimating model performance

Cross validation

In order to get an estimate of how well our model will be when predicting future data we can perform cross validation, which consist on training a few models independently on different subsets of the data, using them to predict a validation set and measuring their performance.

Since our data depends on time, we make our splits by removing the last portions of the series and using them as validation sets. This process is implemented in MLForecast.cross_validation.

fcst = MLForecast(
    models=lgb.LGBMRegressor(**lgb_params),
    freq=1,
    target_transforms=[Differences([24])],
    lags=[1, 24],
    lag_transforms={
        1: [ExpandingMean()],
        24: [RollingMean(window_size=48)],
    },
    date_features=[hour_index],
)
cv_result = fcst.cross_validation(
    df,
    n_windows=4,  # number of models to train/splits to perform
    h=48,  # length of the validation set in each window
)
cv_result
unique_iddscutoffyLGBMRegressor
0H19681781615.315.383165
1H19681881614.914.923219
2H19681981614.614.667834
3H19682081614.214.275964
4H19682181613.913.973491
763H413100496099.065.644823
764H413100596088.071.717097
765H413100696047.076.704377
766H413100796041.053.446638
767H413100896034.054.902634
fig = plot_series(cv_result, cv_result.drop(columns='cutoff'), max_insample_length=0)

We can compute the RMSE on each split.

from utilsforecast.losses import rmse
def evaluate_cv(df):
    return rmse(df, models=['LGBMRegressor'], id_col='cutoff').set_index('cutoff')

split_rmse = evaluate_cv(cv_result)
split_rmse
LGBMRegressor
cutoff
81629.418172
86434.257598
91213.145763
96035.066261

And the average RMSE across splits.

split_rmse.mean()
LGBMRegressor    27.971949
dtype: float64

You can quickly try different features and evaluate them this way. We can try removing the differencing and using an exponentially weighted average of the lag 1 instead of the expanding mean.

from mlforecast.lag_transforms import ExponentiallyWeightedMean
fcst = MLForecast(
    models=lgb.LGBMRegressor(**lgb_params),
    freq=1,
    lags=[1, 24],
    lag_transforms={
        1: [ExponentiallyWeightedMean(alpha=0.5)],
        24: [RollingMean(window_size=48)],      
    },
    date_features=[hour_index],    
)
cv_result2 = fcst.cross_validation(
    df,
    n_windows=4,
    h=48,
)
evaluate_cv(cv_result2).mean()
LGBMRegressor    25.874439
dtype: float64

LightGBMCV

In the same spirit of estimating our model’s performance, LightGBMCV allows us to train a few LightGBM models on different partitions of the data. The main differences with MLForecast.cross_validation are:

  • It can only train LightGBM models.
  • It trains all models simultaneously and gives us per-iteration averages of the errors across the complete forecasting window, which allows us to find the best iteration.
from mlforecast.lgb_cv import LightGBMCV
cv = LightGBMCV(
    freq=1,
    target_transforms=[Differences([24])],
    lags=[1, 24],
    lag_transforms={
        1: [ExpandingMean()],
        24: [RollingMean(window_size=48)],
    },
    date_features=[hour_index],
    num_threads=2,
)
cv_hist = cv.fit(
    df,
    n_windows=4,
    h=48,
    params=lgb_params,
    eval_every=5,
    early_stopping_evals=5,    
    compute_cv_preds=True,
)
[5] mape: 0.158639
[10] mape: 0.163739
[15] mape: 0.161535
[20] mape: 0.169491
[25] mape: 0.163690
[30] mape: 0.164198
Early stopping at round 30
Using best iteration: 5

As you can see this gives us the error by iteration (controlled by the eval_every argument) and performs early stopping (which can be configured with early_stopping_evals and early_stopping_pct). If you set compute_cv_preds=True the out-of-fold predictions are computed using the best iteration found and are saved in the cv_preds_ attribute.

cv.cv_preds_
unique_iddsyBoosterwindow
0H19681715.315.4731820
1H19681814.915.0385710
2H19681914.614.8494090
3H19682014.214.4483790
4H19682113.914.1483790
187H413100499.061.4253963
188H413100588.062.8868903
189H413100647.057.8868903
190H413100741.038.8490093
191H413100834.044.7205623
fig = plot_series(cv.cv_preds_, cv.cv_preds_.drop(columns='window'), max_insample_length=0)

You can use this class to quickly try different configurations of features and hyperparameters. Once you’ve found a combination that works you can train a model with those features and hyperparameters on all the data by creating an MLForecast object from the LightGBMCV one as follows:

final_fcst = MLForecast.from_cv(cv)
final_fcst.fit(df)
preds = final_fcst.predict(48)
fig = plot_series(df, preds, max_insample_length=24 * 14)