Lag transformations

mlforecast allows you to define transformations on the lags to use as features. These are provided through the lag_transforms argument, which is a dict where the keys are the lags and the values are a list of transformations to apply to that lag.

Data setup

import numpy as np

from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series

data = generate_daily_series(10)

Built-in transformations

The built-in lag transformations are in the mlforecast.lag_transforms module.

from mlforecast.lag_transforms import RollingMean, ExpandingStd

fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [ExpandingStd()],
        7: [RollingMean(window_size=7, min_samples=1), RollingMean(window_size=14)]
    },
)

Once you define your transformations you can see what they look like with MLForecast.preprocess.

fcst.preprocess(data).head(2)

	unique_id	ds	y	expanding_std_lag1	rolling_mean_lag7_window_size7_min_samples1	rolling_mean_lag7_window_size14
20	id_0	2000-01-21	6.319961	1.956363	3.234486	3.283064
21	id_0	2000-01-22	0.071677	2.028545	3.256055	3.291068

Extending the built-in transformations

You can compose the built-in transformations by using the Combine class, which takes two transformations and an operator.

import operator

from mlforecast.lag_transforms import Combine

fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            RollingMean(window_size=14),
            Combine(
                RollingMean(window_size=7),
                RollingMean(window_size=14),
                operator.truediv,
            )
        ],
    },
)
prep = fcst.preprocess(data)
prep.head(2)

	unique_id	ds	y	rolling_mean_lag1_window_size7	rolling_mean_lag1_window_size14	rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14
14	id_0	2000-01-15	0.435006	3.234486	3.283064	0.985204
15	id_0	2000-01-16	1.489309	3.256055	3.291068	0.989361

np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag1_window_size14'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14']
)

If you want one of the transformations in Combine to be applied to a different lag you can use the Offset class, which will apply the offset first and then the transformation.

from mlforecast.lag_transforms import Offset

fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            Combine(
                RollingMean(window_size=7),
                Offset(RollingMean(window_size=7), n=1),
                operator.truediv,
            )
        ],
        2: [RollingMean(window_size=7)]
    },
)
prep = fcst.preprocess(data)
prep.head(2)

	unique_id	ds	y	rolling_mean_lag1_window_size7	rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7	rolling_mean_lag2_window_size7
8	id_0	2000-01-09	1.462798	3.326081	0.998331	3.331641
9	id_0	2000-01-10	2.035518	3.360938	1.010480	3.326081

np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag2_window_size7'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7']
)

numba-based transformations

The window-ops package provides transformations defined as numba JIT compiled functions. We use numba because it makes them really fast and can also bypass python’s GIL, which allows running them concurrently with multithreading. The main benefit of using these transformations is that they’re very easy to implement. However, when we need to update their values on the predict step they can very slow, because we have to call the function again on the complete history and just keep the last value, so if performance is a concern you should try to use the built-in ones or set keep_last_n in MLForecast.preprocess or MLForecast.fit to the minimum number of samples that your transformations require.

from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.shift import shift_array

@njit
def ratio_over_previous(x, offset=1):
    """Computes the ratio between the current value and its `offset` lag"""
    return x / shift_array(x, offset=offset)

@njit
def diff_over_previous(x, offset=1):
    """Computes the difference between the current value and its `offset` lag"""
    return x - shift_array(x, offset=offset)

If your function takes more arguments than the input array you can provide a tuple like: (func, arg1, arg2, ...)

fcst = MLForecast(
    models=[],
    freq='D',
    lags=[1, 2, 3],
    lag_transforms={
        1: [expanding_mean, ratio_over_previous, (ratio_over_previous, 2)],  # the second ratio sets offset=2
        2: [diff_over_previous],
    },
)
prep = fcst.preprocess(data)
prep.head(2)

	unique_id	ds	y	lag1	lag2	lag3	expanding_mean_lag1	ratio_over_previous_lag1	ratio_over_previous_lag1_offset2	diff_over_previous_lag2
3	id_0	2000-01-04	3.481831	2.445887	1.218794	0.322947	1.329209	2.006809	7.573645	0.895847
4	id_0	2000-01-05	4.191721	3.481831	2.445887	1.218794	1.867365	1.423546	2.856785	1.227093

As you can see the name of the function is used as the transformation name plus the _lag suffix. If the function has other arguments and they’re not set to their default values they’re included as well, as is done with offset=2 here.

np.testing.assert_allclose(prep['lag1'] / prep['lag2'], prep['ratio_over_previous_lag1'])
np.testing.assert_allclose(prep['lag1'] / prep['lag3'], prep['ratio_over_previous_lag1_offset2'])
np.testing.assert_allclose(prep['lag2'] - prep['lag3'], prep['diff_over_previous_lag2'])

Getting Started

How-to guides

Tutorials

API Reference

Lag transformations

Data setup

Built-in transformations

Extending the built-in transformations

numba-based transformations

Getting Started

How-to guides

Tutorials

API Reference

​Data setup

​Built-in transformations

​Extending the built-in transformations

​numba-based transformations

Data setup

Built-in transformations

Extending the built-in transformations

numba-based transformations