Lag transformations
Compute features based on lags
mlforecast allows you to define transformations on the lags to use as
features. These are provided through the lag_transforms
argument,
which is a dict where the keys are the lags and the values are a list of
transformations to apply to that lag.
Data setup
Built-in transformations
The built-in lag transformations are in the mlforecast.lag_transforms
module.
Once you define your transformations you can see what they look like
with
MLForecast.preprocess
.
unique_id | ds | y | expanding_std_lag1 | rolling_mean_lag7_window_size7_min_samples1 | rolling_mean_lag7_window_size14 | |
---|---|---|---|---|---|---|
20 | id_0 | 2000-01-21 | 6.319961 | 1.956363 | 3.234486 | 3.283064 |
21 | id_0 | 2000-01-22 | 0.071677 | 2.028545 | 3.256055 | 3.291068 |
Extending the built-in transformations
You can compose the built-in transformations by using the
Combine
class, which takes two transformations and an operator.
unique_id | ds | y | rolling_mean_lag1_window_size7 | rolling_mean_lag1_window_size14 | rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14 | |
---|---|---|---|---|---|---|
14 | id_0 | 2000-01-15 | 0.435006 | 3.234486 | 3.283064 | 0.985204 |
15 | id_0 | 2000-01-16 | 1.489309 | 3.256055 | 3.291068 | 0.989361 |
If you want one of the transformations in
Combine
to be applied to a different lag you can use the
Offset
class, which will apply the offset first and then the transformation.
unique_id | ds | y | rolling_mean_lag1_window_size7 | rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7 | rolling_mean_lag2_window_size7 | |
---|---|---|---|---|---|---|
8 | id_0 | 2000-01-09 | 1.462798 | 3.326081 | 0.998331 | 3.331641 |
9 | id_0 | 2000-01-10 | 2.035518 | 3.360938 | 1.010480 | 3.326081 |
numba-based transformations
The window-ops package provides transformations defined as numba JIT compiled functions. We use numba because it makes them really fast and can also bypass python’s GIL, which allows running them concurrently with multithreading.
The main benefit of using these transformations is that they’re very
easy to implement. However, when we need to update their values on the
predict step they can very slow, because we have to call the function
again on the complete history and just keep the last value, so if
performance is a concern you should try to use the built-in ones or set
keep_last_n
in
MLForecast.preprocess
or
MLForecast.fit
to the minimum number of samples that your transformations require.
If your function takes more arguments than the input array you can
provide a tuple like: (func, arg1, arg2, ...)
unique_id | ds | y | lag1 | lag2 | lag3 | expanding_mean_lag1 | ratio_over_previous_lag1 | ratio_over_previous_lag1_offset2 | diff_over_previous_lag2 | |
---|---|---|---|---|---|---|---|---|---|---|
3 | id_0 | 2000-01-04 | 3.481831 | 2.445887 | 1.218794 | 0.322947 | 1.329209 | 2.006809 | 7.573645 | 0.895847 |
4 | id_0 | 2000-01-05 | 4.191721 | 3.481831 | 2.445887 | 1.218794 | 1.867365 | 1.423546 | 2.856785 | 1.227093 |
As you can see the name of the function is used as the transformation
name plus the _lag
suffix. If the function has other arguments and
they’re not set to their default values they’re included as well, as is
done with offset=2
here.