LightGBMCV
| Name | Type | Description | Default |
|---|---|---|---|
freq | str or int | Pandas offset alias, e.g. ‘D’, ‘W-THU’ or integer denoting the frequency of the series. | required |
lags | list of int | Lags of the target to use as features. Defaults to None. | None |
lag_transforms | dict of int to list of functions | Mapping of target lags to their transformations. Defaults to None. | None |
date_features | list of str or callable | Features computed from the dates. Can be pandas date attributes or functions that will take the dates as input. Defaults to None. | None |
num_threads | int | Number of threads to use when computing the features. Defaults to 1. | 1 |
target_transforms | list of transformers | Transformations that will be applied to the target before computing the features and restored after the forecasting step. Defaults to None. | None |
LightGBMCV.fit
| Name | Type | Description | Default |
|---|---|---|---|
df | pandas DataFrame | Series data in long format. | required |
n_windows | int | Number of windows to evaluate. | required |
h | int | Forecast horizon. | required |
id_col | str | Column that identifies each serie. Defaults to ‘unique_id’. | ‘unique_id’ |
time_col | str | Column that identifies each timestep, its values can be timestamps or integers. Defaults to ‘ds’. | ‘ds’ |
target_col | str | Column that contains the target. Defaults to ‘y’. | ‘y’ |
step_size | int | Step size between each cross validation window. If None it will be equal to h. Defaults to None. | None |
num_iterations | int | Maximum number of boosting iterations to run. Defaults to 100. | 100 |
params | dict | Parameters to be passed to the LightGBM Boosters. Defaults to None. | None |
static_features | list of str | Names of the features that are static and will be repeated when forecasting. Defaults to None. | None |
dropna | bool | Drop rows with missing values produced by the transformations. Defaults to True. | True |
keep_last_n | int | Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. Defaults to None. | None |
eval_every | int | Number of boosting iterations to train before evaluating on the whole forecast window. Defaults to 10. | 10 |
weights | sequence of float | Weights to multiply the metric of each window. If None, all windows have the same weight. Defaults to None. | None |
metric | str or callable | Metric used to assess the performance of the models and perform early stopping. Defaults to ‘mape’. | ‘mape’ |
verbose_eval | bool | Print the metrics of each evaluation. | True |
early_stopping_evals | int | Maximum number of evaluations to run without improvement. Defaults to 2. | 2 |
early_stopping_pct | float | Minimum percentage improvement in metric value in early_stopping_evals evaluations. Defaults to 0.01. | 0.01 |
compute_cv_preds | bool | Compute predictions for each window after finding the best iteration. Defaults to False. | False |
before_predict_callback | callable | Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None. | None |
after_predict_callback | callable | Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None. | None |
input_size | int | Maximum training samples per serie in each window. If None, will use an expanding window. Defaults to None. | None |
| Type | Description |
|---|---|
list of tuple | List of (boosting rounds, metric value) tuples. |
LightGBMCV.predict
| Name | Type | Description | Default |
|---|---|---|---|
h | int | Forecast horizon. | required |
before_predict_callback | callable | Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None. | None |
after_predict_callback | callable | Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None. | None |
X_df | DataFrame | Dataframe with the future exogenous features. Should have the id column and the time column. Defaults to None. | None |
| Type | Description |
|---|---|
DataFrame | Predictions for each serie and timestep, with one column per window. |
LightGBMCV.setup
| Name | Type | Description | Default |
|---|---|---|---|
df | pandas DataFrame | Series data in long format. | required |
n_windows | int | Number of windows to evaluate. | required |
h | int | Forecast horizon. | required |
id_col | str | Column that identifies each serie. Defaults to ‘unique_id’. | ‘unique_id’ |
time_col | str | Column that identifies each timestep, its values can be timestamps or integers. Defaults to ‘ds’. | ‘ds’ |
target_col | str | Column that contains the target. Defaults to ‘y’. | ‘y’ |
step_size | int | Step size between each cross validation window. If None it will be equal to h. Defaults to None. | None |
params | dict | Parameters to be passed to the LightGBM Boosters. Defaults to None. | None |
static_features | list of str | Names of the features that are static and will be repeated when forecasting. Defaults to None. | None |
dropna | bool | Drop rows with missing values produced by the transformations. Defaults to True. | True |
keep_last_n | int | Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. Defaults to None. | None |
weights | sequence of float | Weights to multiply the metric of each window. If None, all windows have the same weight. Defaults to None. | None |
metric | str or callable | Metric used to assess the performance of the models and perform early stopping. Defaults to ‘mape’. | ‘mape’ |
input_size | int | Maximum training samples per serie in each window. If None, will use an expanding window. Defaults to None. | None |
| Type | Description |
|---|---|
LightGBMCV | CV object with internal data structures for partial_fit. |
LightGBMCV.partial_fit
| Name | Type | Description | Default |
|---|---|---|---|
num_iterations | int | Number of boosting iterations to run | required |
before_predict_callback | callable | Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None. | None |
after_predict_callback | callable | Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None. | None |
| Type | Description |
|---|---|
float | Weighted metric after training for num_iterations. |
Example
This shows an example with just 4 series of the M4 dataset. If you want to run it yourself on all of them, you can refer to this notebook.| unique_id | ds | y | |
|---|---|---|---|
| 86796 | H196 | 1 | 11.8 |
| 86797 | H196 | 2 | 11.4 |
| 86798 | H196 | 3 | 11.1 |
| 86799 | H196 | 4 | 10.8 |
| 86800 | H196 | 5 | 10.6 |
| … | … | … | … |
| 325235 | H413 | 1004 | 99.0 |
| 325236 | H413 | 1005 | 88.0 |
| 325237 | H413 | 1006 | 47.0 |
| 325238 | H413 | 1007 | 41.0 |
| 325239 | H413 | 1008 | 34.0 |
eval_every parameter that can be used to control this, that is, if
eval_every=10 (the default) every 10 boosting iterations we’re going
to compute forecasts for the complete window and report the error.
We also have early stopping parameters:
early_stopping_evals: how many evaluations of the full window should we go without improving to stop training?early_stopping_pct: what’s the minimum percentage improvement we want in theseearly_stopping_evalsin order to keep training?
compute_cv_preds we get the predictions from each model on
their corresponding validation fold.
| unique_id | ds | y | Booster | window | |
|---|---|---|---|---|---|
| 0 | H196 | 865 | 15.5 | 15.522924 | 0 |
| 1 | H196 | 866 | 15.1 | 14.985832 | 0 |
| 2 | H196 | 867 | 14.8 | 14.667901 | 0 |
| 3 | H196 | 868 | 14.4 | 14.514592 | 0 |
| 4 | H196 | 869 | 14.2 | 14.035793 | 0 |
| … | … | … | … | … | … |
| 187 | H413 | 956 | 59.0 | 77.227905 | 1 |
| 188 | H413 | 957 | 58.0 | 80.589641 | 1 |
| 189 | H413 | 958 | 53.0 | 53.986834 | 1 |
| 190 | H413 | 959 | 38.0 | 36.749786 | 1 |
| 191 | H413 | 960 | 46.0 | 36.281225 | 1 |
predict returns
the predictions from every model trained.
source
| unique_id | ds | Booster0 | Booster1 | |
|---|---|---|---|---|
| 0 | H196 | 961 | 15.670252 | 15.848888 |
| 1 | H196 | 962 | 15.522924 | 15.697399 |
| 2 | H196 | 963 | 14.985832 | 15.166213 |
| 3 | H196 | 964 | 14.985832 | 14.723238 |
| 4 | H196 | 965 | 14.562152 | 14.451092 |
| … | … | … | … | … |
| 187 | H413 | 1004 | 70.695242 | 65.917620 |
| 188 | H413 | 1005 | 66.216580 | 62.615788 |
| 189 | H413 | 1006 | 63.896573 | 67.848598 |
| 190 | H413 | 1007 | 46.922797 | 50.981950 |
| 191 | H413 | 1008 | 45.006541 | 42.752819 |
season_length=24 and window_size=7 then we’ll average the value at
the same hour for every day of the week.
setup method.
partial_fit to only train for some
iterations and return the score of the forecast window.

