lag_transforms
argument,
which is a dict where the keys are the lags and the values are a list of
transformations to apply to that lag.
Data setup
Built-in transformations
The built-in lag transformations are in themlforecast.lag_transforms
module.
MLForecast.preprocess
.
unique_id | ds | y | expanding_std_lag1 | rolling_mean_lag7_window_size7_min_samples1 | rolling_mean_lag7_window_size14 | |
---|---|---|---|---|---|---|
20 | id_0 | 2000-01-21 | 6.319961 | 1.956363 | 3.234486 | 3.283064 |
21 | id_0 | 2000-01-22 | 0.071677 | 2.028545 | 3.256055 | 3.291068 |
Extending the built-in transformations
You can compose the built-in transformations by using theCombine
class, which takes two transformations and an operator.
unique_id | ds | y | rolling_mean_lag1_window_size7 | rolling_mean_lag1_window_size14 | rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14 | |
---|---|---|---|---|---|---|
14 | id_0 | 2000-01-15 | 0.435006 | 3.234486 | 3.283064 | 0.985204 |
15 | id_0 | 2000-01-16 | 1.489309 | 3.256055 | 3.291068 | 0.989361 |
Combine
to be applied to a different lag you can use the
Offset
class, which will apply the offset first and then the transformation.
unique_id | ds | y | rolling_mean_lag1_window_size7 | rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7 | rolling_mean_lag2_window_size7 | |
---|---|---|---|---|---|---|
8 | id_0 | 2000-01-09 | 1.462798 | 3.326081 | 0.998331 | 3.331641 |
9 | id_0 | 2000-01-10 | 2.035518 | 3.360938 | 1.010480 | 3.326081 |
numba-based transformations
The window-ops package provides transformations defined as numba JIT compiled functions. We use numba because it makes them really fast and can also bypass python’s GIL, which allows running them concurrently with multithreading. The main benefit of using these transformations is that they’re very easy to implement. However, when we need to update their values on the predict step they can very slow, because we have to call the function again on the complete history and just keep the last value, so if performance is a concern you should try to use the built-in ones or setkeep_last_n
in
MLForecast.preprocess
or
MLForecast.fit
to the minimum number of samples that your transformations require.
(func, arg1, arg2, ...)
unique_id | ds | y | lag1 | lag2 | lag3 | expanding_mean_lag1 | ratio_over_previous_lag1 | ratio_over_previous_lag1_offset2 | diff_over_previous_lag2 | |
---|---|---|---|---|---|---|---|---|---|---|
3 | id_0 | 2000-01-04 | 3.481831 | 2.445887 | 1.218794 | 0.322947 | 1.329209 | 2.006809 | 7.573645 | 0.895847 |
4 | id_0 | 2000-01-05 | 4.191721 | 3.481831 | 2.445887 | 1.218794 | 1.867365 | 1.423546 | 2.856785 | 1.227093 |
_lag
suffix. If the function has other arguments and
they’re not set to their default values they’re included as well, as is
done with offset=2
here.