> ## Documentation Index
> Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Lag transformations | MLForecast

> Compute features based on lags

mlforecast allows you to define transformations on the lags to use as
features. These are provided through the `lag_transforms` argument,
which is a dict where the keys are the lags and the values are a list of
transformations to apply to that lag.

## Data setup

```python theme={null}
import numpy as np

from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
```

```python theme={null}
data = generate_daily_series(10)
```

## Built-in transformations

The built-in lag transformations are in the `mlforecast.lag_transforms`
module.

```python theme={null}
from mlforecast.lag_transforms import RollingMean, ExpandingStd
```

```python theme={null}
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [ExpandingStd()],
        7: [RollingMean(window_size=7, min_samples=1), RollingMean(window_size=14)]
    },
)
```

Once you define your transformations you can see what they look like
with `MLForecast.preprocess`.

```python theme={null}
fcst.preprocess(data).head(2)
```

|    | unique\_id | ds         | y        | expanding\_std\_lag1 | rolling\_mean\_lag7\_window\_size7\_min\_samples1 | rolling\_mean\_lag7\_window\_size14 |
| -- | ---------- | ---------- | -------- | -------------------- | ------------------------------------------------- | ----------------------------------- |
| 20 | id\_0      | 2000-01-21 | 6.319961 | 1.956363             | 3.234486                                          | 3.283064                            |
| 21 | id\_0      | 2000-01-22 | 0.071677 | 2.028545             | 3.256055                                          | 3.291068                            |

### Extending the built-in transformations

You can compose the built-in transformations by using the `Combine`
class, which takes two transformations and an operator.

```python theme={null}
import operator

from mlforecast.lag_transforms import Combine
```

```python theme={null}
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            RollingMean(window_size=14),
            Combine(
                RollingMean(window_size=7),
                RollingMean(window_size=14),
                operator.truediv,
            )
        ],
    },
)
prep = fcst.preprocess(data)
prep.head(2)
```

|    | unique\_id | ds         | y        | rolling\_mean\_lag1\_window\_size7 | rolling\_mean\_lag1\_window\_size14 | rolling\_mean\_lag1\_window\_size7\_truediv\_rolling\_mean\_lag1\_window\_size14 |
| -- | ---------- | ---------- | -------- | ---------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------- |
| 14 | id\_0      | 2000-01-15 | 0.435006 | 3.234486                           | 3.283064                            | 0.985204                                                                         |
| 15 | id\_0      | 2000-01-16 | 1.489309 | 3.256055                           | 3.291068                            | 0.989361                                                                         |

```python theme={null}
np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag1_window_size14'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14']
)
```

If you want one of the transformations in `Combine` to be applied to a
different lag you can use the `Offset` class, which will apply the
offset first and then the transformation.

```python theme={null}
from mlforecast.lag_transforms import Offset
```

```python theme={null}
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            Combine(
                RollingMean(window_size=7),
                Offset(RollingMean(window_size=7), n=1),
                operator.truediv,
            )
        ],
        2: [RollingMean(window_size=7)]
    },
)
prep = fcst.preprocess(data)
prep.head(2)
```

|   | unique\_id | ds         | y        | rolling\_mean\_lag1\_window\_size7 | rolling\_mean\_lag1\_window\_size7\_truediv\_rolling\_mean\_lag2\_window\_size7 | rolling\_mean\_lag2\_window\_size7 |
| - | ---------- | ---------- | -------- | ---------------------------------- | ------------------------------------------------------------------------------- | ---------------------------------- |
| 8 | id\_0      | 2000-01-09 | 1.462798 | 3.326081                           | 0.998331                                                                        | 3.331641                           |
| 9 | id\_0      | 2000-01-10 | 2.035518 | 3.360938                           | 1.010480                                                                        | 3.326081                           |

```python theme={null}
np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag2_window_size7'],
    prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7']
)
```

## numba-based transformations

The [window-ops package](https://github.com/jmoralez/window_ops)
provides transformations defined as [numba](https://numba.pydata.org/)
[JIT compiled](https://en.wikipedia.org/wiki/Just-in-time_compilation)
functions. We use numba because it makes them really fast and can also
bypass [python’s
GIL](https://wiki.python.org/moin/GlobalInterpreterLock), which allows
running them concurrently with multithreading.

The main benefit of using these transformations is that they’re very
easy to implement. However, when we need to update their values on the
predict step they can very slow, because we have to call the function
again on the complete history and just keep the last value, so if
performance is a concern you should try to use the built-in ones or set
`keep_last_n` in `MLForecast.preprocess` or `MLForecast.fit` to the
minimum number of samples that your transformations require.

```python theme={null}
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.shift import shift_array
```

```python theme={null}
@njit
def ratio_over_previous(x, offset=1):
    """Computes the ratio between the current value and its `offset` lag"""
    return x / shift_array(x, offset=offset)

@njit
def diff_over_previous(x, offset=1):
    """Computes the difference between the current value and its `offset` lag"""
    return x - shift_array(x, offset=offset)
```

If your function takes more arguments than the input array you can
provide a tuple like: `(func, arg1, arg2, ...)`

```python theme={null}
fcst = MLForecast(
    models=[],
    freq='D',
    lags=[1, 2, 3],
    lag_transforms={
        1: [expanding_mean, ratio_over_previous, (ratio_over_previous, 2)],  # the second ratio sets offset=2
        2: [diff_over_previous],
    },
)
prep = fcst.preprocess(data)
prep.head(2)
```

|   | unique\_id | ds         | y        | lag1     | lag2     | lag3     | expanding\_mean\_lag1 | ratio\_over\_previous\_lag1 | ratio\_over\_previous\_lag1\_offset2 | diff\_over\_previous\_lag2 |
| - | ---------- | ---------- | -------- | -------- | -------- | -------- | --------------------- | --------------------------- | ------------------------------------ | -------------------------- |
| 3 | id\_0      | 2000-01-04 | 3.481831 | 2.445887 | 1.218794 | 0.322947 | 1.329209              | 2.006809                    | 7.573645                             | 0.895847                   |
| 4 | id\_0      | 2000-01-05 | 4.191721 | 3.481831 | 2.445887 | 1.218794 | 1.867365              | 1.423546                    | 2.856785                             | 1.227093                   |

As you can see the name of the function is used as the transformation
name plus the `_lag` suffix. If the function has other arguments and
they’re not set to their default values they’re included as well, as is
done with `offset=2` here.

```python theme={null}
np.testing.assert_allclose(prep['lag1'] / prep['lag2'], prep['ratio_over_previous_lag1'])
np.testing.assert_allclose(prep['lag1'] / prep['lag3'], prep['ratio_over_previous_lag1_offset2'])
np.testing.assert_allclose(prep['lag2'] - prep['lag3'], prep['diff_over_previous_lag2'])
```
