mlforecast allows you to define transformations on the lags to use as features. These are provided through the lag_transforms argument, which is a dict where the keys are the lags and the values are a list of transformations to apply to that lag.

Data setup

import numpy as np

from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
data = generate_daily_series(10)

Built-in transformations

The built-in lag transformations are in the mlforecast.lag_transforms module.

from mlforecast.lag_transforms import RollingMean, ExpandingStd
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [ExpandingStd()],
        7: [RollingMean(window_size=7, min_samples=1), RollingMean(window_size=14)]
    },
)

Once you define your transformations you can see what they look like with MLForecast.preprocess.

fcst.preprocess(data).head(2)
unique_iddsyexpanding_std_lag1rolling_mean_lag7_window_size7_min_samples1rolling_mean_lag7_window_size14
20id_02000-01-216.3199611.9563633.2344863.283064
21id_02000-01-220.0716772.0285453.2560553.291068

Extending the built-in transformations

You can compose the built-in transformations by using the Combine class, which takes two transformations and an operator.

import operator

from mlforecast.lag_transforms import Combine
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            RollingMean(window_size=14),
            Combine(
                RollingMean(window_size=7),
                RollingMean(window_size=14),
                operator.truediv,
            )
        ],
    },
)
prep = fcst.preprocess(data)
prep.head(2)
unique_iddsyrolling_mean_lag1_window_size7rolling_mean_lag1_window_size14rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14
14id_02000-01-150.4350063.2344863.2830640.985204
15id_02000-01-161.4893093.2560553.2910680.989361
np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag1_window_size14'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14']
)

If you want one of the transformations in Combine to be applied to a different lag you can use the Offset class, which will apply the offset first and then the transformation.

from mlforecast.lag_transforms import Offset
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            Combine(
                RollingMean(window_size=7),
                Offset(RollingMean(window_size=7), n=1),
                operator.truediv,
            )
        ],
        2: [RollingMean(window_size=7)]
    },
)
prep = fcst.preprocess(data)
prep.head(2)
unique_iddsyrolling_mean_lag1_window_size7rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7rolling_mean_lag2_window_size7
8id_02000-01-091.4627983.3260810.9983313.331641
9id_02000-01-102.0355183.3609381.0104803.326081
np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag2_window_size7'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7']
)

numba-based transformations

The window-ops package provides transformations defined as numba JIT compiled functions. We use numba because it makes them really fast and can also bypass python’s GIL, which allows running them concurrently with multithreading.

The main benefit of using these transformations is that they’re very easy to implement. However, when we need to update their values on the predict step they can very slow, because we have to call the function again on the complete history and just keep the last value, so if performance is a concern you should try to use the built-in ones or set keep_last_n in MLForecast.preprocess or MLForecast.fit to the minimum number of samples that your transformations require.

from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.shift import shift_array
@njit
def ratio_over_previous(x, offset=1):
    """Computes the ratio between the current value and its `offset` lag"""
    return x / shift_array(x, offset=offset)

@njit
def diff_over_previous(x, offset=1):
    """Computes the difference between the current value and its `offset` lag"""
    return x - shift_array(x, offset=offset)

If your function takes more arguments than the input array you can provide a tuple like: (func, arg1, arg2, ...)

fcst = MLForecast(
    models=[],
    freq='D',
    lags=[1, 2, 3],
    lag_transforms={
        1: [expanding_mean, ratio_over_previous, (ratio_over_previous, 2)],  # the second ratio sets offset=2
        2: [diff_over_previous],
    },
)
prep = fcst.preprocess(data)
prep.head(2)
unique_iddsylag1lag2lag3expanding_mean_lag1ratio_over_previous_lag1ratio_over_previous_lag1_offset2diff_over_previous_lag2
3id_02000-01-043.4818312.4458871.2187940.3229471.3292092.0068097.5736450.895847
4id_02000-01-054.1917213.4818312.4458871.2187941.8673651.4235462.8567851.227093

As you can see the name of the function is used as the transformation name plus the _lag suffix. If the function has other arguments and they’re not set to their default values they’re included as well, as is done with offset=2 here.

np.testing.assert_allclose(prep['lag1'] / prep['lag2'], prep['ratio_over_previous_lag1'])
np.testing.assert_allclose(prep['lag1'] / prep['lag3'], prep['ratio_over_previous_lag1_offset2'])
np.testing.assert_allclose(prep['lag2'] - prep['lag3'], prep['diff_over_previous_lag2'])