The HierarchicalForecast package contains utility functions to wrangle and visualize hierarchical series datasets. The aggregate function of the module allows you to create a hierarchy from categorical variables representing the structure levels, returning also the aggregation contraints matrix S\mathbf{S}.

In addition, HierarchicalForecast ensures compatibility of its reconciliation methods with other popular machine-learning libraries via its external forecast adapters that transform output base forecasts from external libraries into a compatible data frame format.

Aggregate Function


source

aggregate

 aggregate
            (df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyFrame[A
            ny]')], spec:list[list[str]],
            exog_vars:Optional[dict[str,Union[str,list[str]]]]=None,
            sparse_s:bool=False, id_col:str='unique_id',
            time_col:str='ds', target_cols:list[str]=['y'])

Utils Aggregation Function. Aggregates bottom level series contained in the DataFrame df according to levels defined in the spec list.

TypeDefaultDetails
dfUnionDataframe with columns [time_col, *target_cols], columns to aggregate and optionally exog_vars.
speclistlist of levels. Each element of the list should contain a list of columns of df to aggregate.
exog_varsOptionalNone
sparse_sboolFalseReturn S_df as a sparse Pandas dataframe.
id_colstrunique_idColumn that will identify each serie after aggregation.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
target_colslist[‘y’]list of columns that contains the targets to aggregate.
ReturnstupleHierarchically structured series.

Hierarchical Visualization


source

HierarchicalPlot

 HierarchicalPlot
                   (S:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyF
                   rame[Any]')], tags:dict[str,numpy.ndarray],
                   S_id_col:str='unique_id')

*Hierarchical Plot

This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series.

Parameters:
S: DataFrame with summing matrix of size (base, bottom), see aggregate function.
tags: np.ndarray, with hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.
S_id_col : str=‘unique_id’, column that identifies each aggregation.
*


source

plot_summing_matrix

 plot_summing_matrix ()

*Summation Constraints plot

This method simply plots the hierarchical aggregation constraints matrix S\mathbf{S}.*


source

plot_series

 plot_series (series:str,
              Y_df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyFram
              e[Any]')], models:Optional[list[str]]=None,
              level:Optional[list[int]]=None, id_col:str='unique_id',
              time_col:str='ds', target_col:str='y')

*Single Series plot

Parameters:
series: str, string identifying the 'unique_id' any-level series to plot.
Y_df: DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'.
models: list[str], string identifying filtering model columns.
level: float list 0-100, confidence levels for prediction intervals available in Y_df.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.

Returns:
Single series plot with filtered models and prediction interval level.

*


source

plot_hierarchically_linked_series

 plot_hierarchically_linked_series (bottom_series:str,
                                    Y_df:Union[ForwardRef('DataFrame[Any]'
                                    ),ForwardRef('LazyFrame[Any]')],
                                    models:Optional[list[str]]=None,
                                    level:Optional[list[int]]=None,
                                    id_col:str='unique_id',
                                    time_col:str='ds', target_col:str='y')

*Hierarchically Linked Series plot

Parameters:
bottom_series: str, string identifying the 'unique_id' bottom-level series to plot.
Y_df: DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: list[str], string identifying filtering model columns.
level: float list 0-100, confidence levels for prediction intervals available in Y_df.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.

Returns:
Collection of hierarchilly linked series plots associated with the bottom_series and filtered models and prediction interval level.

*


source

plot_hierarchical_predictions_gap

 plot_hierarchical_predictions_gap
                                    (Y_df:Union[ForwardRef('DataFrame[Any]
                                    '),ForwardRef('LazyFrame[Any]')],
                                    models:Optional[list[str]]=None,
                                    xlabel:Optional[str]=None,
                                    ylabel:Optional[str]=None,
                                    id_col:str='unique_id',
                                    time_col:str='ds', target_col:str='y')

*Hierarchically Predictions Gap plot

Parameters:
Y_df: DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: list[str], string identifying filtering model columns.
xlabel: str, string for the plot’s x axis label.
ylabel: str, string for the plot’s y axis label.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.

Returns:
Plots of aggregated predictions at different levels of the hierarchical structure. The aggregation is performed according to the tag levels see aggregate function.

*

from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)

fcst = StatsForecast( 
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='MS', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df, h=24).reset_index()

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)
# polars
from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df_pl  = pl.from_pandas(Y_test_df)
Y_train_df_pl = pl.from_pandas(Y_train_df)

fcst = StatsForecast(
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='1m', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df_pl, h=24)

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)

External Forecast Adapters


source

samples_to_quantiles_df

 samples_to_quantiles_df (samples:numpy.ndarray, unique_ids:Sequence[str],
                          dates:list[str],
                          quantiles:Optional[list[float]]=None,
                          level:Optional[list[int]]=None,
                          model_name:str='model', id_col:str='unique_id',
                          time_col:str='ds', backend:str='pandas')

*Transform Random Samples into HierarchicalForecast input. Auxiliary function to create compatible HierarchicalForecast input Y_hat_df dataframe.

Parameters:
samples: numpy array. Samples from forecast distribution of shape [n_series, n_samples, horizon].
unique_ids: string list. Unique identifiers for each time series.
dates: datetime list. list of forecast dates.
quantiles: float list in [0., 1.]. Alternative to level, quantiles to estimate from y distribution.
level: int list in [0,100]. Probability levels for prediction intervals.
model_name: string. Name of forecasting model.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
backend : str=‘pandas’, backend to use for the output dataframe, either ‘pandas’ or ‘polars’.

Returns:
quantiles: float list in [0., 1.]. quantiles to estimate from y distribution .
Y_hat_df: DataFrame. With base quantile forecasts with columns ds and models to reconcile indexed by unique_id.*