Skip to main content

MLForecast

MLForecast(
    models,
    freq,
    lags=None,
    lag_transforms=None,
    date_features=None,
    num_threads=1,
    target_transforms=None,
    lag_transforms_namer=None,
)
Forecasting pipeline Parameters:
NameTypeDescriptionDefault
modelsregressor or list of regressorsModels that will be trained and used to compute the forecasts.required
freqstr or int or BaseOffsetPandas offset, pandas offset alias, e.g. ‘D’, ‘W-THU’ or integer denoting the frequency of the series.required
lagslist of intLags of the target to use as features. Defaults to None.None
lag_transformsdict of int to list of functionsMapping of target lags to their transformations. Defaults to None.None
date_featureslist of str or callableFeatures computed from the dates. Can be pandas date attributes or functions that will take the dates as input. Defaults to None.None
num_threadsintNumber of threads to use when computing the features. Defaults to 1.1
target_transformslist of transformersTransformations that will be applied to the target before computing the features and restored after the forecasting step. Defaults to None.None
lag_transforms_namercallableFunction that takes a transformation (either function or class), a lag and extra arguments and produces a name. Defaults to None.None

MLForecast.fit

fit(
    df,
    id_col="unique_id",
    time_col="ds",
    target_col="y",
    static_features=None,
    dropna=True,
    keep_last_n=None,
    max_horizon=None,
    prediction_intervals=None,
    fitted=False,
    as_numpy=False,
    weight_col=None,
    models_fit_kwargs=None,
)
Apply the feature engineering and train the models. Parameters:
NameTypeDescriptionDefault
dfpandas or polars DataFrameSeries data in long format.required
id_colstrColumn that identifies each serie. Defaults to ‘unique_id’.‘unique_id’
time_colstrColumn that identifies each timestep, its values can be timestamps or integers. Defaults to ‘ds’.‘ds’
target_colstrColumn that contains the target. Defaults to ‘y’.‘y’
static_featureslist of strNames of the features that are static and will be repeated when forecasting. If None, will consider all columns (except id_col and time_col) as static. Defaults to None.None
dropnaboolDrop rows with missing values produced by the transformations. Defaults to True.True
keep_last_nintKeep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. Defaults to None.None
max_horizonintTrain this many models, where each model will predict a specific horizon. Defaults to None.None
prediction_intervalsPredictionIntervalsConfiguration to calibrate prediction intervals (Conformal Prediction). Defaults to None.None
fittedboolSave in-sample predictions. Defaults to False.False
as_numpyboolCast features to numpy array. Defaults to False.False
weight_colstrColumn that contains the sample weights. Defaults to None.None
Returns:
NameTypeDescription
MLForecastMLForecastForecast object with series values and trained models.

MLForecast.save

save(path)
Save forecast object Parameters:
NameTypeDescriptionDefault
pathstr or PathDirectory where artifacts will be stored.required

MLForecast.load

load(path)
Load forecast object Parameters:
NameTypeDescriptionDefault
pathstr or PathDirectory with saved artifacts.required

MLForecast.update

update(df)
Update the values of the stored series. Parameters:
NameTypeDescriptionDefault
dfpandas or polars DataFrameDataframe with new observations.required

MLForecast.make_future_dataframe

make_future_dataframe(h)
Create a dataframe with all ids and future times in the forecasting horizon. Parameters:
NameTypeDescriptionDefault
hintNumber of periods to predict.required
Returns:
TypeDescription
DataFramepandas or polars DataFrame: DataFrame with expected ids and future times

MLForecast.get_missing_future

get_missing_future(h, X_df)
Get the missing id and time combinations in X_df. Parameters:
NameTypeDescriptionDefault
hintNumber of periods to predict.required
X_dfpandas or polars DataFrameDataframe with the future exogenous features. Should have the id column and the time column.required
Returns:
TypeDescription
DFTypepandas or polars DataFrame: DataFrame with expected ids and future times missing in X_df

MLForecast.predict

predict(
    h,
    before_predict_callback=None,
    after_predict_callback=None,
    new_df=None,
    level=None,
    X_df=None,
    ids=None,
)
Compute the predictions for the next h steps. Parameters:
NameTypeDescriptionDefault
hintNumber of periods to predict.required
before_predict_callbackcallableFunction to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None.None
after_predict_callbackcallableFunction to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None.None
new_dfpandas or polars DataFrameSeries data of new observations for which forecasts are to be generated. This dataframe should have the same structure as the one used to fit the model, including any features and time series data. If new_df is not None, the method will generate forecasts for the new observations. Defaults to None.None
levellist of ints or floatsConfidence levels between 0 and 100 for prediction intervals. Defaults to None.None
X_dfpandas or polars DataFrameDataframe with the future exogenous features. Should have the id column and the time column. Defaults to None.None
idslist of strList with subset of ids seen during training for which the forecasts should be computed. Defaults to None.None
Returns:
TypeDescription
DFTypepandas or polars DataFrame: Predictions for each serie and timestep, with one column per model.

MLForecast.preprocess

preprocess(
    df,
    id_col="unique_id",
    time_col="ds",
    target_col="y",
    static_features=None,
    dropna=True,
    keep_last_n=None,
    max_horizon=None,
    return_X_y=False,
    as_numpy=False,
    weight_col=None,
)
Add the features to data. Parameters:
NameTypeDescriptionDefault
dfpandas DataFrameSeries data in long format.required
id_colstrColumn that identifies each serie. Defaults to ‘unique_id’.‘unique_id’
time_colstrColumn that identifies each timestep, its values can be timestamps or integers. Defaults to ‘ds’.‘ds’
target_colstrColumn that contains the target. Defaults to ‘y’.‘y’
static_featureslist of strNames of the features that are static and will be repeated when forecasting. Defaults to None.None
dropnaboolDrop rows with missing values produced by the transformations. Defaults to True.True
keep_last_nintKeep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. Defaults to None.None
max_horizonintTrain this many models, where each model will predict a specific horizon. Defaults to None.None
return_X_yboolReturn a tuple with the features and the target. If False will return a single dataframe. Defaults to False.False
as_numpyboolCast features to numpy array. Only works for return_X_y=True. Defaults to False.False
weight_colstrColumn that contains the sample weights. Defaults to None.None
Returns:
TypeDescription
Union[DFType, Tuple[DFType, ndarray]]DataFrame or tuple of pandas Dataframe and a numpy array: df plus added features and target(s).

MLForecast.fit_models

fit_models(X, y, models_fit_kwargs=None)
Manually train models. Use this if you called MLForecast.preprocess beforehand. Parameters:
NameTypeDescriptionDefault
Xpandas or polars DataFrame or numpy arrayFeatures.required
ynumpy arrayTarget.required
Returns:
NameTypeDescription
MLForecastMLForecastForecast object with trained models.

MLForecast.cross_validation

cross_validation(
    df,
    n_windows,
    h,
    id_col="unique_id",
    time_col="ds",
    target_col="y",
    step_size=None,
    static_features=None,
    dropna=True,
    keep_last_n=None,
    refit=True,
    max_horizon=None,
    before_predict_callback=None,
    after_predict_callback=None,
    prediction_intervals=None,
    level=None,
    input_size=None,
    fitted=False,
    as_numpy=False,
    weight_col=None,
)
Perform time series cross validation. Creates n_windows splits where each window has h test periods, trains the models, computes the predictions and merges the actuals. Parameters:
NameTypeDescriptionDefault
dfpandas or polars DataFrameSeries data in long format.required
n_windowsintNumber of windows to evaluate.required
hintForecast horizon.required
id_colstrColumn that identifies each serie. Defaults to ‘unique_id’.‘unique_id’
time_colstrColumn that identifies each timestep, its values can be timestamps or integers. Defaults to ‘ds’.‘ds’
target_colstrColumn that contains the target. Defaults to ‘y’.‘y’
step_sizeintStep size between each cross validation window. If None it will be equal to h. Defaults to None.None
static_featureslist of strNames of the features that are static and will be repeated when forecasting. Defaults to None.None
dropnaboolDrop rows with missing values produced by the transformations. Defaults to True.True
keep_last_nintKeep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it. Defaults to None.None
max_horizonintTrain this many models, where each model will predict a specific horizon. Defaults to None.None
refitbool or intRetrain model for each cross validation window. If False, the models are trained at the beginning and then used to predict each window. If positive int, the models are retrained every refit windows. Defaults to True.True
before_predict_callbackcallableFunction to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index. Defaults to None.None
after_predict_callbackcallableFunction to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index. Defaults to None.None
prediction_intervalsPredictionIntervalsConfiguration to calibrate prediction intervals (Conformal Prediction). Defaults to None.None
levellist of ints or floatsConfidence levels between 0 and 100 for prediction intervals. Defaults to None.None
input_sizeintMaximum training samples per serie in each window. If None, will use an expanding window. Defaults to None.None
fittedboolStore the in-sample predictions. Defaults to False.False
as_numpyboolCast features to numpy array. Defaults to False.False
weight_colstrColumn that contains the sample weights. Defaults to None.None
Returns:
TypeDescription
DFTypepandas or polars DataFrame: Predictions for each window with the series id, timestamp, last train date, target value and predictions from each model.

MLForecast.from_cv

from_cv(cv)