Skip to main content
The HierarchicalForecast package contains utility functions to wrangle and visualize hierarchical series datasets. The aggregate function of the module allows you to create a hierarchy from categorical variables representing the structure levels, returning also the aggregation contraints matrix S\mathbf{S}. In addition, HierarchicalForecast ensures compatibility of its reconciliation methods with other popular machine-learning libraries via its external forecast adapters that transform output base forecasts from external libraries into a compatible data frame format.

Aggregate Function

aggregate

aggregate(df, spec, exog_vars=None, sparse_s=False, id_col='unique_id', time_col='ds', id_time_col=None, target_cols=('y',))
Utils Aggregation Function. Aggregates bottom level series contained in the DataFrame df according to levels defined in the spec list. Parameters:
NameTypeDescriptionDefault
dfFrameDataframe with columns [time_col, *target_cols], columns to aggregate and optionally exog_vars.required
speclist[list[str]]list of levels. Each element of the list should contain a list of columns of df to aggregate.required
exog_varsOptional[dict[str, Union[str, list[str]]]]dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None.None
sparse_sboolReturn S_df as a sparse Pandas dataframe. Default is False.False
id_colstrColumn that will identify each serie after aggregation. Default is “unique_id”.‘unique_id’
time_colstrColumn that identifies each timestep, its values can be timestamps or integers. Default is “ds”.‘ds’
id_time_colOptional[str]Column that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally. Default is None.None
target_colsSequence[str]list of columns that contains the targets to aggregate. Default is (“y”,).(‘y’,)
Returns:
TypeDescription
tuple[FrameT, FrameT, dict]tuple[FrameT, FrameT, dict]: Y_df, S_df, tags Y_df: Hierarchically structured series. S_df: Summing dataframe. tags: Aggregation indices.

aggregate_temporal

aggregate_temporal(df, spec, exog_vars=None, sparse_s=False, id_col='unique_id', time_col='ds', id_time_col='temporal_id', target_cols=('y',), aggregation_type='local')
Utils Aggregation Function for Temporal aggregations. Aggregates bottom level timesteps contained in the DataFrame df according to temporal levels defined in the spec list. Parameters:
NameTypeDescriptionDefault
dfFrameDataframe with columns [time_col, target_cols] and columns to aggregate.required
specdict[str, int]Dictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation.required
exog_varsOptional[dict[str, Union[str, list[str]]]]dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None.None
sparse_sboolReturn S_df as a sparse Pandas dataframe. Default is False.False
id_colstrColumn that will identify each serie after aggregation. Default is ‘unique_id’.‘unique_id’
time_colstrColumn that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.‘ds’
id_time_colstrColumn that will identify each timestep after aggregation. Default is ‘temporal_id’.‘temporal_id’
target_colsSequence[str]List of columns that contain the targets to aggregate. Default is (‘y’,).(‘y’,)
aggregation_typestrIf ‘local’ the aggregation will be performed on the timestamps of each timeseries independently. If ‘global’ the aggregation will be performed on the unique timestamps of all timeseries. Default is ‘local’.‘local’
Returns:
TypeDescription
tuple[FrameT, FrameT, dict]tuple[FrameT, FrameT, dict]: Y_df, S_df, tags Y_df: Temporally hierarchically structured series. S_df: Temporal summing dataframe. tags: Temporal aggregation indices.

make_future_dataframe

make_future_dataframe(df, freq, h, id_col='unique_id', time_col='ds')
Create future dataframe for forecasting. Parameters:
NameTypeDescriptionDefault
dfFrameDataframe with ids, times and values for the exogenous regressors.required
freqUnion[str, int]Frequency of the data. Must be a valid pandas or polars offset alias, or an integer.required
hintForecast horizon.required
id_colstrColumn that identifies each serie. Default is ‘unique_id’.‘unique_id’
time_colstrColumn that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.‘ds’
Returns:
NameTypeDescription
FrameTFrameTDataFrame with future values.

get_cross_temporal_tags

get_cross_temporal_tags(df, tags_cs, tags_te, sep='//', id_col='unique_id', id_time_col='temporal_id', cross_temporal_id_col='cross_temporal_id')
Get cross-temporal tags. Parameters:
NameTypeDescriptionDefault
dfFrameDataFrame with temporal ids.required
tags_csdict[str, ndarray]Tags for the cross-sectional hierarchies.required
tags_tedict[str, ndarray]Tags for the temporal hierarchies.required
sepstrSeparator for the cross-temporal tags. Default is ”//”.’//‘
id_colstrColumn that identifies each series. Default is ‘unique_id’.‘unique_id’
id_time_colstrColumn that identifies each (aggregated) timestep. Default is ‘temporal_id’.‘temporal_id’
cross_temporal_id_colstrColumn that will identify each cross-temporal aggregation. Default is ‘cross_temporal_id’.‘cross_temporal_id’
Returns:
TypeDescription
tuple[FrameT, dict[str, ndarray]]tuple[FrameT, dict[str, np.ndarray]]: df, tags_ct df: DataFrame with cross-temporal ids. tags_ct: Tags for the cross-temporal hierarchies.

Hierarchical Visualization

HierarchicalPlot

HierarchicalPlot(S, tags, S_id_col='unique_id')
Hierarchical Plot This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series. Parameters:
NameTypeDescriptionDefault
SFrameDataFrame with summing matrix of size (base, bottom), see aggregate function.required
tagsdict[str, ndarray]hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.required
S_id_colstrcolumn that identifies each aggregation. Default is ‘unique_id’.‘unique_id’

HierarchicalPlot.plot_summing_matrix

plot_summing_matrix()
Summation Constraints plot This method simply plots the hierarchical aggregation constraints matrix mathbfS\\mathbf{S}. Returns:
NameTypeDescription
figFigurefigure object containing the plot of the summing matrix.

HierarchicalPlot.plot_series

plot_series(series, Y_df, models=None, level=None, id_col='unique_id', time_col='ds', target_col='y')
Single Series plot Parameters:
NameTypeDescriptionDefault
seriesstrstring identifying the 'unique_id' any-level series to plot.required
Y_dfFramehierarchically structured series (mathbfy_[a,b]\\mathbf{y}\_{[a,b]}). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'.required
modelsOptional[list[str]]string identifying filtering model columns. Default is None.None
levelOptional[list[int]]confidence levels for prediction intervals available in Y_df. Default is None.None
id_colstrcolumn that identifies each series. Default is ‘unique_id’.‘unique_id’
time_colstrcolumn that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.‘ds’
target_colstrcolumn that contains the target. Default is ‘y’.‘y’
Returns:
NameTypeDescription
figFigurefigure object containing the plot of the single series.

HierarchicalPlot.plot_hierarchically_linked_series

plot_hierarchically_linked_series(bottom_series, Y_df, models=None, level=None, id_col='unique_id', time_col='ds', target_col='y')
Hierarchically Linked Series plot Parameters:
NameTypeDescriptionDefault
bottom_seriesstrstring identifying the 'unique_id' bottom-level series to plot.required
Y_dfFramehierarchically structured series (mathbfy_[a,b]\\mathbf{y}\_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.required
modelsOptional[list[str]]string identifying filtering model columns. Default is None.None
levelOptional[list[int]]confidence levels for prediction intervals available in Y_df. Default is None.None
id_colstrcolumn that identifies each series. Default is ‘unique_id’.‘unique_id’
time_colstrcolumn that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.‘ds’
target_colstrcolumn that contains the target. Default is ‘y’.‘y’
Returns:
NameTypeDescription
figFigurefigure object containing the plots of the hierarchically linked series.

HierarchicalPlot.plot_hierarchical_predictions_gap

plot_hierarchical_predictions_gap(Y_df, models=None, xlabel=None, ylabel=None, id_col='unique_id', time_col='ds', target_col='y')
Hierarchically Predictions Gap plot Parameters:
NameTypeDescriptionDefault
Y_dfFramehierarchically structured series (mathbfy_[a,b]\\mathbf{y}\_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.required
modelsOptional[list[str]]string identifying filtering model columns. Default is None.None
xlabelOptional[str]string for the plot’s x axis label. Default is None.None
ylabelOptional[str]string for the plot’s y axis label. Default is None.None
id_colstrcolumn that identifies each series. Default is ‘unique_id’.‘unique_id’
time_colstrcolumn that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.‘ds’
target_colstrcolumn that contains the target. Default is ‘y’.‘y’
Returns:
NameTypeDescription
figFigurefigure object containing the plot of the aggregated predictions at different levels of the hierarchical structure.

Example

from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)

fcst = StatsForecast(
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='MS',
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df, h=24).reset_index()

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)