module mlforecast.lgb_cv


class LightGBMCV

method __init__

__init__(
    freq: Union[int, str],
    lags: Optional[Iterable[int]] = None,
    lag_transforms: Optional[Dict[int, List[Union[Callable, Tuple[Callable, Any]]]]] = None,
    date_features: Optional[Iterable[Union[str, Callable]]] = None,
    num_threads: int = 1,
    target_transforms: Optional[List[Union[BaseTargetTransform, _BaseGroupedArrayTargetTransform]]] = None
)
Create LightGBM CV object. Args:
  • freq (str or int): Pandas offset alias, e.g. ‘D’, ‘W-THU’ or integer denoting the frequency of the series.
  • lags (list of int, optional): Lags of the target to use as features. Defaults to None.
  • lag_transforms (dict of int to list of functions, optional): Mapping of target lags to their transformations. Defaults to None.
  • date_features (list of str or callable, optional): Features computed from the dates. Can be pandas date attributes or functions that will take the dates as input. Defaults to None.
  • num_threads (int): Number of threads to use when computing the features. Defaults to 1.
  • target_transforms (list of transformers, optional): Transformations that will be applied to the target before computing the features and restored after the forecasting step. Defaults to None.

method find_best_iter

find_best_iter(hist, early_stopping_evals) → int

method fit

fit(
    df: DataFrame,
    n_windows: int,
    h: int,
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    target_col: str = 'y',
    step_size: Optional[int] = None,
    num_iterations: int = 100,
    params: Optional[Dict[str, Any]] = None,
    static_features: Optional[List[str]] = None,
    dropna: bool = True,
    keep_last_n: Optional[int] = None,
    eval_every: int = 10,
    weights: Optional[Sequence[float]] = None,
    metric: Union[str, Callable] = 'mape',
    verbose_eval: bool = True,
    early_stopping_evals: int = 2,
    early_stopping_pct: float = 0.01,
    compute_cv_preds: bool = False,
    before_predict_callback: Optional[Callable] = None,
    after_predict_callback: Optional[Callable] = None,
    input_size: Optional[int] = None
) → List[Tuple[int, float]]
Train boosters simultaneously and assess their performance on the complete forecasting window. Args:
  • df (pandas DataFrame): Series data in long format.
  • n_windows (int): Number of windows to evaluate.
  • h (int): Forecast horizon.
  • id_col (str): Column that identifies each serie. Defaults to ‘unique_id’.
  • time_col (str): Column that identifies each timestep, its values can be timestamps or integers. Defaults to ‘ds’.
  • target_col (str): Column that contains the target. Defaults to ‘y’.
  • step_size (int, optional): Step size between each cross validation window. If None it will be equal to h. Defaults to None.
  • num_iterations (int): Maximum number of boosting iterations to run. Defaults to 100.
  • params (dict, optional): Parameters to be passed to the LightGBM Boosters. Defaults to None.
  • static_features (list of str, optional): Names of the features that are static and will be repeated when forecasting. Defaults to None.
  • dropna (bool): Drop rows with missing values produced by the transformations. Defaults to True.
  • keep_last_n (int, optional): Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. Defaults to None.
  • eval_every (int): Number of boosting iterations to train before evaluating on the whole forecast window. Defaults to 10.
  • weights (sequence of float, optional): Weights to multiply the metric of each window. If None, all windows have the same weight. Defaults to None.
  • metric (str or callable): Metric used to assess the performance of the models and perform early stopping. Defaults to ‘mape’.
  • verbose_eval (bool): Print the metrics of each evaluation.
  • early_stopping_evals (int): Maximum number of evaluations to run without improvement. Defaults to 2.
  • early_stopping_pct (float): Minimum percentage improvement in metric value in early_stopping_evals evaluations. Defaults to 0.01.
  • compute_cv_preds (bool): Compute predictions for each window after finding the best iteration. Defaults to False.
  • before_predict_callback (callable, optional): Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None.
  • after_predict_callback (callable, optional): Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None.
  • input_size (int, optional): Maximum training samples per serie in each window. If None, will use an expanding window. Defaults to None.
Returns:
  • (list of tuple): List of (boosting rounds, metric value) tuples.

method partial_fit

partial_fit(
    num_iterations: int,
    before_predict_callback: Optional[Callable] = None,
    after_predict_callback: Optional[Callable] = None
) → float
Train the boosters for some iterations. Args:
  • num_iterations (int): Number of boosting iterations to run
  • before_predict_callback (callable, optional): Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None.
  • after_predict_callback (callable, optional): Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None.
Returns:
  • (float): Weighted metric after training for num_iterations.

method predict

predict(
    h: int,
    before_predict_callback: Optional[Callable] = None,
    after_predict_callback: Optional[Callable] = None,
    X_df: Optional[DataFrame] = None
) → DataFrame
Compute predictions with each of the trained boosters. Args:
  • h (int): Forecast horizon.
  • before_predict_callback (callable, optional): Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None.
  • after_predict_callback (callable, optional): Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None.
  • X_df (pandas DataFrame, optional): Dataframe with the future exogenous features. Should have the id column and the time column. Defaults to None.
Returns:
  • (pandas DataFrame): Predictions for each serie and timestep, with one column per window.

method setup

setup(
    df: DataFrame,
    n_windows: int,
    h: int,
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    target_col: str = 'y',
    step_size: Optional[int] = None,
    params: Optional[Dict[str, Any]] = None,
    static_features: Optional[List[str]] = None,
    dropna: bool = True,
    keep_last_n: Optional[int] = None,
    weights: Optional[Sequence[float]] = None,
    metric: Union[str, Callable] = 'mape',
    input_size: Optional[int] = None
)
Initialize internal data structures to iteratively train the boosters. Use this before calling partial_fit. Args:
  • df (pandas DataFrame): Series data in long format.
  • n_windows (int): Number of windows to evaluate.
  • h (int): Forecast horizon.
  • id_col (str): Column that identifies each serie. Defaults to ‘unique_id’.
  • time_col (str): Column that identifies each timestep, its values can be timestamps or integers. Defaults to ‘ds’.
  • target_col (str): Column that contains the target. Defaults to ‘y’.
  • step_size (int, optional): Step size between each cross validation window. If None it will be equal to h. Defaults to None.
  • params (dict, optional): Parameters to be passed to the LightGBM Boosters. Defaults to None.
  • static_features (list of str, optional): Names of the features that are static and will be repeated when forecasting. Defaults to None.
  • dropna (bool): Drop rows with missing values produced by the transformations. Defaults to True.
  • keep_last_n (int, optional): Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. Defaults to None.
  • weights (sequence of float, optional): Weights to multiply the metric of each window. If None, all windows have the same weight. Defaults to None.
  • metric (str or callable): Metric used to assess the performance of the models and perform early stopping. Defaults to ‘mape’.
  • input_size (int, optional): Maximum training samples per serie in each window. If None, will use an expanding window. Defaults to None.
Returns:
  • (LightGBMCV): CV object with internal data structures for partial_fit.

method should_stop

should_stop(hist, early_stopping_evals, early_stopping_pct) → bool