Time series cross validation with LightGBM.
Create LightGBM CV object.
Type | Default | Details | |
---|---|---|---|
freq | Union | Pandas offset alias, e.g. ‘D’, ‘W-THU’ or integer denoting the frequency of the series. | |
lags | Optional | None | Lags of the target to use as features. |
lag_transforms | Optional | None | Mapping of target lags to their transformations. |
date_features | Optional | None | Features computed from the dates. Can be pandas date attributes or functions that will take the dates as input. |
num_threads | int | 1 | Number of threads to use when computing the features. |
target_transforms | Optional | None | Transformations that will be applied to the target before computing the features and restored after the forecasting step. |
unique_id | ds | y | |
---|---|---|---|
86796 | H196 | 1 | 11.8 |
86797 | H196 | 2 | 11.4 |
86798 | H196 | 3 | 11.1 |
86799 | H196 | 4 | 10.8 |
86800 | H196 | 5 | 10.6 |
… | … | … | … |
325235 | H413 | 1004 | 99.0 |
325236 | H413 | 1005 | 88.0 |
325237 | H413 | 1006 | 47.0 |
325238 | H413 | 1007 | 41.0 |
325239 | H413 | 1008 | 34.0 |
eval_every
parameter that can be used to control this, that is, if
eval_every=10
(the default) every 10 boosting iterations we’re going
to compute forecasts for the complete window and report the error.
We also have early stopping parameters:
early_stopping_evals
: how many evaluations of the full window
should we go without improving to stop training?early_stopping_pct
: what’s the minimum percentage improvement we
want in these early_stopping_evals
in order to keep training?Train boosters simultaneously and assess their performance on the complete forecasting window.
Type | Default | Details | |
---|---|---|---|
df | DataFrame | Series data in long format. | |
n_windows | int | Number of windows to evaluate. | |
h | int | Forecast horizon. | |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
target_col | str | y | Column that contains the target. |
step_size | Optional | None | Step size between each cross validation window. If None it will be equal to h . |
num_iterations | int | 100 | Maximum number of boosting iterations to run. |
params | Optional | None | Parameters to be passed to the LightGBM Boosters. |
static_features | Optional | None | Names of the features that are static and will be repeated when forecasting. |
dropna | bool | True | Drop rows with missing values produced by the transformations. |
keep_last_n | Optional | None | Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. |
eval_every | int | 10 | Number of boosting iterations to train before evaluating on the whole forecast window. |
weights | Optional | None | Weights to multiply the metric of each window. If None, all windows have the same weight. |
metric | Union | mape | Metric used to assess the performance of the models and perform early stopping. |
verbose_eval | bool | True | Print the metrics of each evaluation. |
early_stopping_evals | int | 2 | Maximum number of evaluations to run without improvement. |
early_stopping_pct | float | 0.01 | Minimum percentage improvement in metric value in early_stopping_evals evaluations. |
compute_cv_preds | bool | False | Compute predictions for each window after finding the best iteration. |
before_predict_callback | Optional | None | Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. |
after_predict_callback | Optional | None | Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. |
input_size | Optional | None | Maximum training samples per serie in each window. If None, will use an expanding window. |
Returns | List | List of (boosting rounds, metric value) tuples. |
compute_cv_preds
we get the predictions from each model on
their corresponding validation fold.
unique_id | ds | y | Booster | window | |
---|---|---|---|---|---|
0 | H196 | 865 | 15.5 | 15.522924 | 0 |
1 | H196 | 866 | 15.1 | 14.985832 | 0 |
2 | H196 | 867 | 14.8 | 14.667901 | 0 |
3 | H196 | 868 | 14.4 | 14.514592 | 0 |
4 | H196 | 869 | 14.2 | 14.035793 | 0 |
… | … | … | … | … | … |
187 | H413 | 956 | 59.0 | 77.227905 | 1 |
188 | H413 | 957 | 58.0 | 80.589641 | 1 |
189 | H413 | 958 | 53.0 | 53.986834 | 1 |
190 | H413 | 959 | 38.0 | 36.749786 | 1 |
191 | H413 | 960 | 46.0 | 36.281225 | 1 |
predict
returns
the predictions from every model trained.
Compute predictions with each of the trained boosters.
Type | Default | Details | |
---|---|---|---|
h | int | Forecast horizon. | |
before_predict_callback | Optional | None | Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. |
after_predict_callback | Optional | None | Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. |
X_df | Optional | None | Dataframe with the future exogenous features. Should have the id column and the time column. |
Returns | DataFrame | Predictions for each serie and timestep, with one column per window. |
unique_id | ds | Booster0 | Booster1 | |
---|---|---|---|---|
0 | H196 | 961 | 15.670252 | 15.848888 |
1 | H196 | 962 | 15.522924 | 15.697399 |
2 | H196 | 963 | 14.985832 | 15.166213 |
3 | H196 | 964 | 14.985832 | 14.723238 |
4 | H196 | 965 | 14.562152 | 14.451092 |
… | … | … | … | … |
187 | H413 | 1004 | 70.695242 | 65.917620 |
188 | H413 | 1005 | 66.216580 | 62.615788 |
189 | H413 | 1006 | 63.896573 | 67.848598 |
190 | H413 | 1007 | 46.922797 | 50.981950 |
191 | H413 | 1008 | 45.006541 | 42.752819 |
season_length=24
and window_size=7
then we’ll average the value at
the same hour for every day of the week.
setup
method.
Initialize internal data structures to iteratively train the boosters. Use this before calling partial_fit.
Type | Default | Details | |
---|---|---|---|
df | DataFrame | Series data in long format. | |
n_windows | int | Number of windows to evaluate. | |
h | int | Forecast horizon. | |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
target_col | str | y | Column that contains the target. |
step_size | Optional | None | Step size between each cross validation window. If None it will be equal to h . |
params | Optional | None | Parameters to be passed to the LightGBM Boosters. |
static_features | Optional | None | Names of the features that are static and will be repeated when forecasting. |
dropna | bool | True | Drop rows with missing values produced by the transformations. |
keep_last_n | Optional | None | Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. |
weights | Optional | None | Weights to multiply the metric of each window. If None, all windows have the same weight. |
metric | Union | mape | Metric used to assess the performance of the models and perform early stopping. |
input_size | Optional | None | Maximum training samples per serie in each window. If None, will use an expanding window. |
Returns | LightGBMCV | CV object with internal data structures for partial_fit. |
partial_fit
to only train for some
iterations and return the score of the forecast window.
Train the boosters for some iterations.
Type | Default | Details | |
---|---|---|---|
num_iterations | int | Number of boosting iterations to run | |
before_predict_callback | Optional | None | Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. |
after_predict_callback | Optional | None | Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. |
Returns | float | Weighted metric after training for num_iterations. |