LightGBMCV

 LightGBMCV (freq:Union[int,str], lags:Optional[Iterable[int]]=None, lag_t
             ransforms:Optional[Dict[int,List[Union[Callable,Tuple[Callabl
             e,Any]]]]]=None,
             date_features:Optional[Iterable[Union[str,Callable]]]=None,
             num_threads:int=1, target_transforms:Optional[List[Union[mlfo
             recast.target_transforms.BaseTargetTransform,mlforecast.targe
             t_transforms._BaseGroupedArrayTargetTransform]]]=None)

Create LightGBM CV object.

	Type	Default	Details
freq	Union		Pandas offset alias, e.g. ‘D’, ‘W-THU’ or integer denoting the frequency of the series.
lags	Optional	None	Lags of the target to use as features.
lag_transforms	Optional	None	Mapping of target lags to their transformations.
date_features	Optional	None	Features computed from the dates. Can be pandas date attributes or functions that will take the dates as input.
num_threads	int	1	Number of threads to use when computing the features.
target_transforms	Optional	None	Transformations that will be applied to the target before computing the features and restored after the forecasting step.

Example

This shows an example with just 4 series of the M4 dataset. If you want to run it yourself on all of them, you can refer to this notebook.

import random

from datasetsforecast.m4 import M4, M4Info
from fastcore.test import test_eq, test_fail
from mlforecast.target_transforms import Differences
from nbdev import show_doc

from mlforecast.lag_transforms import SeasonalRollingMean

group = 'Hourly'
await M4.async_download('data', group=group)
df, *_ = M4.load(directory='data', group=group)
df['ds'] = df['ds'].astype('int')
ids = df['unique_id'].unique()
random.seed(0)
sample_ids = random.choices(ids, k=4)
sample_df = df[df['unique_id'].isin(sample_ids)]
sample_df

	unique_id	ds	y
86796	H196	1	11.8
86797	H196	2	11.4
86798	H196	3	11.1
86799	H196	4	10.8
86800	H196	5	10.6
…	…	…	…
325235	H413	1004	99.0
325236	H413	1005	88.0
325237	H413	1006	47.0
325238	H413	1007	41.0
325239	H413	1008	34.0

info = M4Info[group]
horizon = info.horizon
valid = sample_df.groupby('unique_id').tail(horizon)
train = sample_df.drop(valid.index)
train.shape, valid.shape

((3840, 3), (192, 3))

What LightGBMCV does is emulate LightGBM’s cv function where several Boosters are trained simultaneously on different partitions of the data, that is, one boosting iteration is performed on all of them at a time. This allows to have an estimate of the error by iteration, so if we combine this with early stopping we can find the best iteration to train a final model using all the data or even use these individual models’ predictions to compute an ensemble. In order to have a good estimate of the forecasting performance of our model we compute predictions for the whole test period and compute a metric on that. Since this step can slow down training, there’s an eval_every parameter that can be used to control this, that is, if eval_every=10 (the default) every 10 boosting iterations we’re going to compute forecasts for the complete window and report the error. We also have early stopping parameters:

early_stopping_evals: how many evaluations of the full window should we go without improving to stop training?
early_stopping_pct: what’s the minimum percentage improvement we want in these early_stopping_evals in order to keep training?

This makes the LightGBMCV class a good tool to quickly test different configurations of the model. Consider the following example, where we’re going to try to find out which features can improve the performance of our model. We start just using lags.

static_fit_config = dict(
    n_windows=2,
    h=horizon,
    params={'verbose': -1},
    compute_cv_preds=True,
)
cv = LightGBMCV(
    freq=1,
    lags=[24 * (i+1) for i in range(7)],  # one week of lags
)

source

LightGBMCV.fit

 LightGBMCV.fit (df:pandas.core.frame.DataFrame, n_windows:int, h:int,
                 id_col:str='unique_id', time_col:str='ds',
                 target_col:str='y', step_size:Optional[int]=None,
                 num_iterations:int=100,
                 params:Optional[Dict[str,Any]]=None,
                 static_features:Optional[List[str]]=None,
                 dropna:bool=True, keep_last_n:Optional[int]=None,
                 eval_every:int=10,
                 weights:Optional[Sequence[float]]=None,
                 metric:Union[str,Callable]='mape',
                 verbose_eval:bool=True, early_stopping_evals:int=2,
                 early_stopping_pct:float=0.01,
                 compute_cv_preds:bool=False,
                 before_predict_callback:Optional[Callable]=None,
                 after_predict_callback:Optional[Callable]=None,
                 input_size:Optional[int]=None)

Train boosters simultaneously and assess their performance on the complete forecasting window.

	Type	Default	Details
df	DataFrame		Series data in long format.
n_windows	int		Number of windows to evaluate.
h	int		Forecast horizon.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
target_col	str	y	Column that contains the target.
step_size	Optional	None	Step size between each cross validation window. If None it will be equal to `h`.
num_iterations	int	100	Maximum number of boosting iterations to run.
params	Optional	None	Parameters to be passed to the LightGBM Boosters.
static_features	Optional	None	Names of the features that are static and will be repeated when forecasting.
dropna	bool	True	Drop rows with missing values produced by the transformations.
keep_last_n	Optional	None	Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it.
eval_every	int	10	Number of boosting iterations to train before evaluating on the whole forecast window.
weights	Optional	None	Weights to multiply the metric of each window. If None, all windows have the same weight.
metric	Union	mape	Metric used to assess the performance of the models and perform early stopping.
verbose_eval	bool	True	Print the metrics of each evaluation.
early_stopping_evals	int	2	Maximum number of evaluations to run without improvement.
early_stopping_pct	float	0.01	Minimum percentage improvement in metric value in `early_stopping_evals` evaluations.
compute_cv_preds	bool	False	Compute predictions for each window after finding the best iteration.
before_predict_callback	Optional	None	Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index.
after_predict_callback	Optional	None	Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index.
input_size	Optional	None	Maximum training samples per serie in each window. If None, will use an expanding window.
Returns	List		List of (boosting rounds, metric value) tuples.

hist = cv.fit(train, **static_fit_config)

[LightGBM] [Info] Start training from score 51.745632
[10] mape: 0.590690
[20] mape: 0.251093
[30] mape: 0.143643
[40] mape: 0.109723
[50] mape: 0.102099
[60] mape: 0.099448
[70] mape: 0.098349
[80] mape: 0.098006
[90] mape: 0.098718
Early stopping at round 90
Using best iteration: 80

By setting compute_cv_preds we get the predictions from each model on their corresponding validation fold.

cv.cv_preds_

	unique_id	ds	y	Booster	window
0	H196	865	15.5	15.522924	0
1	H196	866	15.1	14.985832	0
2	H196	867	14.8	14.667901	0
3	H196	868	14.4	14.514592	0
4	H196	869	14.2	14.035793	0
…	…	…	…	…	…
187	H413	956	59.0	77.227905	1
188	H413	957	58.0	80.589641	1
189	H413	958	53.0	53.986834	1
190	H413	959	38.0	36.749786	1
191	H413	960	46.0	36.281225	1

The individual models we trained are saved, so calling predict returns the predictions from every model trained.

source

LightGBMCV.predict

 LightGBMCV.predict (h:int,
                     before_predict_callback:Optional[Callable]=None,
                     after_predict_callback:Optional[Callable]=None,
                     X_df:Optional[pandas.core.frame.DataFrame]=None)

Compute predictions with each of the trained boosters.

	Type	Default	Details
h	int		Forecast horizon.
before_predict_callback	Optional	None	Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index.
after_predict_callback	Optional	None	Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index.
X_df	Optional	None	Dataframe with the future exogenous features. Should have the id column and the time column.
Returns	DataFrame		Predictions for each serie and timestep, with one column per window.

preds = cv.predict(horizon)
preds

	unique_id	ds	Booster0	Booster1
0	H196	961	15.670252	15.848888
1	H196	962	15.522924	15.697399
2	H196	963	14.985832	15.166213
3	H196	964	14.985832	14.723238
4	H196	965	14.562152	14.451092
…	…	…	…	…
187	H413	1004	70.695242	65.917620
188	H413	1005	66.216580	62.615788
189	H413	1006	63.896573	67.848598
190	H413	1007	46.922797	50.981950
191	H413	1008	45.006541	42.752819

We can average these predictions and evaluate them.

def evaluate_on_valid(preds):
    preds = preds.copy()
    preds['final_prediction'] = preds.drop(columns=['unique_id', 'ds']).mean(1)
    merged = preds.merge(valid, on=['unique_id', 'ds'])
    merged['abs_err'] = abs(merged['final_prediction'] - merged['y']) / merged['y']
    return merged.groupby('unique_id')['abs_err'].mean().mean()

eval1 = evaluate_on_valid(preds)
eval1

0.11036194712311806

Now, since these series are hourly, maybe we can try to remove the daily seasonality by taking the 168th (24 * 7) difference, that is, substract the value at the same hour from one week ago, thus our target will be

z_t = y_{t} - y_{t-168}

. The features will be computed from this target and when we predict they will be automatically re-applied.

cv2 = LightGBMCV(
    freq=1,
    target_transforms=[Differences([24 * 7])],
    lags=[24 * (i+1) for i in range(7)],
)
hist2 = cv2.fit(train, **static_fit_config)

[LightGBM] [Info] Start training from score 0.519010
[10] mape: 0.089024
[20] mape: 0.090683
[30] mape: 0.092316
Early stopping at round 30
Using best iteration: 10

assert hist2[-1][1] < hist[-1][1]

Nice! We achieve a better score in less iterations. Let’s see if this improvement translates to the validation set as well.

preds2 = cv2.predict(horizon)
eval2 = evaluate_on_valid(preds2)
eval2

0.08956665504570135

assert eval2 < eval1

Great! Maybe we can try some lag transforms now. We’ll try the seasonal rolling mean that averages the values “every season”, that is, if we set season_length=24 and window_size=7 then we’ll average the value at the same hour for every day of the week.

cv3 = LightGBMCV(
    freq=1,
    target_transforms=[Differences([24 * 7])],
    lags=[24 * (i+1) for i in range(7)],
    lag_transforms={
        48: [SeasonalRollingMean(season_length=24, window_size=7)],
    },
)
hist3 = cv3.fit(train, **static_fit_config)

[LightGBM] [Info] Start training from score 0.273641
[10] mape: 0.086724
[20] mape: 0.088466
[30] mape: 0.090536
Early stopping at round 30
Using best iteration: 10

Seems like this is helping as well!

assert hist3[-1][1] < hist2[-1][1]

Does this reflect on the validation set?

preds3 = cv3.predict(horizon)
eval3 = evaluate_on_valid(preds3)
eval3

0.08961279023129345

Nice! mlforecast also supports date features, but in this case our time column is made from integers so there aren’t many possibilites here. As you can see this allows you to iterate faster and get better estimates of the forecasting performance you can expect from your model. If you’re doing hyperparameter tuning it’s useful to be able to run a couple of iterations, assess the performance, and determine if this particular configuration isn’t promising and should be discarded. For example, optuna has pruners that you can call with your current score and it decides if the trial should be discarded. We’ll now show how to do that. Since the CV requires a bit of setup, like the LightGBM datasets and the internal features, we have this setup method.

source

LightGBMCV.setup

 LightGBMCV.setup (df:pandas.core.frame.DataFrame, n_windows:int, h:int,
                   id_col:str='unique_id', time_col:str='ds',
                   target_col:str='y', step_size:Optional[int]=None,
                   params:Optional[Dict[str,Any]]=None,
                   static_features:Optional[List[str]]=None,
                   dropna:bool=True, keep_last_n:Optional[int]=None,
                   weights:Optional[Sequence[float]]=None,
                   metric:Union[str,Callable]='mape',
                   input_size:Optional[int]=None)

Initialize internal data structures to iteratively train the boosters. Use this before calling partial_fit.

	Type	Default	Details
df	DataFrame		Series data in long format.
n_windows	int		Number of windows to evaluate.
h	int		Forecast horizon.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
target_col	str	y	Column that contains the target.
step_size	Optional	None	Step size between each cross validation window. If None it will be equal to `h`.
params	Optional	None	Parameters to be passed to the LightGBM Boosters.
static_features	Optional	None	Names of the features that are static and will be repeated when forecasting.
dropna	bool	True	Drop rows with missing values produced by the transformations.
keep_last_n	Optional	None	Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it.
weights	Optional	None	Weights to multiply the metric of each window. If None, all windows have the same weight.
metric	Union	mape	Metric used to assess the performance of the models and perform early stopping.
input_size	Optional	None	Maximum training samples per serie in each window. If None, will use an expanding window.
Returns	LightGBMCV		CV object with internal data structures for partial_fit.

cv4 = LightGBMCV(
    freq=1,
    lags=[24 * (i+1) for i in range(7)],
)
cv4.setup(
    train,
    n_windows=2,
    h=horizon,
    params={'verbose': -1},
)

LightGBMCV(freq=1, lag_features=['lag24', 'lag48', 'lag72', 'lag96', 'lag120', 'lag144', 'lag168'], date_features=[], num_threads=1, bst_threads=8)

Once we have this we can call partial_fit to only train for some iterations and return the score of the forecast window.

source

LightGBMCV.partial_fit

 LightGBMCV.partial_fit (num_iterations:int,
                         before_predict_callback:Optional[Callable]=None,
                         after_predict_callback:Optional[Callable]=None)

Train the boosters for some iterations.

	Type	Default	Details
num_iterations	int		Number of boosting iterations to run
before_predict_callback	Optional	None	Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index.
after_predict_callback	Optional	None	Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index.
Returns	float		Weighted metric after training for num_iterations.

score = cv4.partial_fit(10)
score

[LightGBM] [Info] Start training from score 51.745632

0.5906900462828166

This is equal to the first evaluation from our first example.

assert hist[0][1] == score

We can now use this score to decide if this configuration is promising. If we want to we can train some more iterations.

score2 = cv4.partial_fit(20)

This is now equal to our third metric from the first example, since this time we trained for 20 iterations.

assert hist[2][1] == score2

Using a custom metric

The built-in metrics are MAPE and RMSE, which are computed by serie and then averaged across all series. If you want to do something different or use a different metric entirely, you can define your own metric like the following:

def weighted_mape(
    y_true: pd.Series,
    y_pred: pd.Series,
    ids: pd.Series,
    dates: pd.Series,
):
    """Weighs the MAPE by the magnitude of the series values"""
    abs_pct_err = abs(y_true - y_pred) / abs(y_true)
    mape_by_serie = abs_pct_err.groupby(ids).mean()
    totals_per_serie = y_pred.groupby(ids).sum()
    series_weights = totals_per_serie / totals_per_serie.sum()
    return (mape_by_serie * series_weights).sum()

_ = LightGBMCV(
    freq=1,
    lags=[24 * (i+1) for i in range(7)],
).fit(
    train,
    n_windows=2,
    h=horizon,
    params={'verbose': -1},
    metric=weighted_mape,
)

[LightGBM] [Info] Start training from score 51.745632
[10] weighted_mape: 0.480353
[20] weighted_mape: 0.218670
[30] weighted_mape: 0.161706
[40] weighted_mape: 0.149992
[50] weighted_mape: 0.149024
[60] weighted_mape: 0.148496
Early stopping at round 60
Using best iteration: 60

Getting Started

How-to guides

Tutorials

API Reference

LightGBMCV

LightGBMCV

Example

LightGBMCV.fit

LightGBMCV.predict

LightGBMCV.setup

LightGBMCV.partial_fit

Using a custom metric

Getting Started

How-to guides

Tutorials

API Reference

​LightGBMCV

​Example

​LightGBMCV.fit

​LightGBMCV.predict

​LightGBMCV.setup

​LightGBMCV.partial_fit

​Using a custom metric

LightGBMCV

Example

LightGBMCV.fit

LightGBMCV.predict

LightGBMCV.setup

LightGBMCV.partial_fit

Using a custom metric