The HierarchicalForecast package contains utility functions to wrangle and visualize hierarchical series datasets. The aggregate function of the module allows you to create a hierarchy from categorical variables representing the structure levels, returning also the aggregation contraints matrix S\mathbf{S}.

In addition, HierarchicalForecast ensures compatibility of its reconciliation methods with other popular machine-learning libraries via its external forecast adapters that transform output base forecasts from external libraries into a compatible data frame format.

Aggregate Function


source

aggregate

 aggregate
            (df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyFrame[A
            ny]')], spec:list[list[str]],
            exog_vars:Optional[dict[str,Union[str,list[str]]]]=None,
            sparse_s:bool=False, id_col:str='unique_id',
            time_col:str='ds', id_time_col:Optional[str]=None,
            target_cols:collections.abc.Sequence[str]=('y',))

Utils Aggregation Function. Aggregates bottom level series contained in the DataFrame df according to levels defined in the spec list.

TypeDefaultDetails
dfUnionDataframe with columns [time_col, *target_cols], columns to aggregate and optionally exog_vars.
speclistlist of levels. Each element of the list should contain a list of columns of df to aggregate.
exog_varsOptionalNone
sparse_sboolFalseReturn S_df as a sparse Pandas dataframe.
id_colstrunique_idColumn that will identify each serie after aggregation.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
id_time_colOptionalNoneColumn that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally.
target_colsSequence(‘y’,)list of columns that contains the targets to aggregate.
ReturnstupleHierarchically structured series.

source

aggregate_temporal

 aggregate_temporal
                     (df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('La
                     zyFrame[Any]')], spec:dict[str,int], exog_vars:Option
                     al[dict[str,Union[str,list[str]]]]=None,
                     sparse_s:bool=False, id_col:str='unique_id',
                     time_col:str='ds', id_time_col:str='temporal_id',
                     target_cols:collections.abc.Sequence[str]=('y',),
                     aggregation_type:str='local')

Utils Aggregation Function for Temporal aggregations. Aggregates bottom level timesteps contained in the DataFrame df according to temporal levels defined in the spec list.

TypeDefaultDetails
dfUnionDataframe with columns [time_col, target_cols] and columns to aggregate.
specdictDictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation.
exog_varsOptionalNone
sparse_sboolFalseReturn S_df as a sparse Pandas dataframe.
id_colstrunique_idColumn that will identify each serie after aggregation.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
id_time_colstrtemporal_idColumn that will identify each timestep after aggregation.
target_colsSequence(‘y’,)List of columns that contain the targets to aggregate.
aggregation_typestrlocalIf ‘local’ the aggregation will be performed on the timestamps of each timeseries independently. If ‘global’ the aggregation will be performed on the unique timestamps of all timeseries.
ReturnstupleTemporally hierarchically structured series.

source

make_future_dataframe

 make_future_dataframe
                        (df:Union[ForwardRef('DataFrame[Any]'),ForwardRef(
                        'LazyFrame[Any]')], freq:Union[str,int], h:int,
                        id_col:str='unique_id', time_col:str='ds')

Create future dataframe for forecasting.

TypeDefaultDetails
dfUnionDataframe with ids, times and values for the exogenous regressors.
freqUnionFrequency of the data. Must be a valid pandas or polars offset alias, or an integer.
hintForecast horizon.
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
ReturnsFrameTDataFrame with future values

source

get_cross_temporal_tags

 get_cross_temporal_tags
                          (df:Union[ForwardRef('DataFrame[Any]'),ForwardRe
                          f('LazyFrame[Any]')],
                          tags_cs:dict[str,numpy.ndarray],
                          tags_te:dict[str,numpy.ndarray], sep:str='//',
                          id_col:str='unique_id',
                          id_time_col:str='temporal_id',
                          cross_temporal_id_col:str='cross_temporal_id')

Get cross-temporal tags.

TypeDefaultDetails
dfUnionDataFrame with temporal ids.
tags_csdictTags for the cross-sectional hierarchies
tags_tedictTags for the temporal hierarchies
sepstr//Separator for the cross-temporal tags.
id_colstrunique_idColumn that identifies each serie.
id_time_colstrtemporal_idColumn that identifies each (aggregated) timestep.
cross_temporal_id_colstrcross_temporal_idColumn that will identify each cross-temporal aggregation.
ReturnstupleDataFrame with cross-temporal ids.

Hierarchical Visualization


source

HierarchicalPlot

 HierarchicalPlot
                   (S:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyF
                   rame[Any]')], tags:dict[str,numpy.ndarray],
                   S_id_col:str='unique_id')

*Hierarchical Plot

This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series.

Parameters:
S: DataFrame with summing matrix of size (base, bottom), see aggregate function.
tags: np.ndarray, with hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.
S_id_col : str=‘unique_id’, column that identifies each aggregation.
*


source

plot_summing_matrix

 plot_summing_matrix ()

*Summation Constraints plot

This method simply plots the hierarchical aggregation constraints matrix S\mathbf{S}.*


source

plot_series

 plot_series (series:str,
              Y_df:Union[ForwardRef('DataFrame[Any]'),ForwardRef('LazyFram
              e[Any]')], models:Optional[list[str]]=None,
              level:Optional[list[int]]=None, id_col:str='unique_id',
              time_col:str='ds', target_col:str='y')

*Single Series plot

Parameters:
series: str, string identifying the 'unique_id' any-level series to plot.
Y_df: DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'.
models: list[str], string identifying filtering model columns.
level: float list 0-100, confidence levels for prediction intervals available in Y_df.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.

Returns:
Single series plot with filtered models and prediction interval level.

*


source

plot_hierarchically_linked_series

 plot_hierarchically_linked_series (bottom_series:str,
                                    Y_df:Union[ForwardRef('DataFrame[Any]'
                                    ),ForwardRef('LazyFrame[Any]')],
                                    models:Optional[list[str]]=None,
                                    level:Optional[list[int]]=None,
                                    id_col:str='unique_id',
                                    time_col:str='ds', target_col:str='y')

*Hierarchically Linked Series plot

Parameters:
bottom_series: str, string identifying the 'unique_id' bottom-level series to plot.
Y_df: DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: list[str], string identifying filtering model columns.
level: float list 0-100, confidence levels for prediction intervals available in Y_df.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.

Returns:
Collection of hierarchilly linked series plots associated with the bottom_series and filtered models and prediction interval level.

*


source

plot_hierarchical_predictions_gap

 plot_hierarchical_predictions_gap
                                    (Y_df:Union[ForwardRef('DataFrame[Any]
                                    '),ForwardRef('LazyFrame[Any]')],
                                    models:Optional[list[str]]=None,
                                    xlabel:Optional[str]=None,
                                    ylabel:Optional[str]=None,
                                    id_col:str='unique_id',
                                    time_col:str='ds', target_col:str='y')

*Hierarchically Predictions Gap plot

Parameters:
Y_df: DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: list[str], string identifying filtering model columns.
xlabel: str, string for the plot’s x axis label.
ylabel: str, string for the plot’s y axis label.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
target_col : str=‘y’, column that contains the target.

Returns:
Plots of aggregated predictions at different levels of the hierarchical structure. The aggregation is performed according to the tag levels see aggregate function.

*

from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)

fcst = StatsForecast( 
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='MS', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df, h=24).reset_index()

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)
# polars
from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df_pl  = pl.from_pandas(Y_test_df)
Y_train_df_pl = pl.from_pandas(Y_train_df)

fcst = StatsForecast(
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='1m', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df_pl, h=24)

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)

External Forecast Adapters


source

samples_to_quantiles_df

 samples_to_quantiles_df (samples:numpy.ndarray,
                          unique_ids:collections.abc.Sequence[str],
                          dates:list[str],
                          quantiles:Optional[list[float]]=None,
                          level:Optional[list[int]]=None,
                          model_name:str='model', id_col:str='unique_id',
                          time_col:str='ds', backend:str='pandas')

*Transform Random Samples into HierarchicalForecast input. Auxiliary function to create compatible HierarchicalForecast input Y_hat_df dataframe.

Parameters:
samples: numpy array. Samples from forecast distribution of shape [n_series, n_samples, horizon].
unique_ids: string list. Unique identifiers for each time series.
dates: datetime list. list of forecast dates.
quantiles: float list in [0., 1.]. Alternative to level, quantiles to estimate from y distribution.
level: int list in [0,100]. Probability levels for prediction intervals.
model_name: string. Name of forecasting model.
id_col : str=‘unique_id’, column that identifies each serie.
time_col : str=‘ds’, column that identifies each timestep, its values can be timestamps or integers.
backend : str=‘pandas’, backend to use for the output dataframe, either ‘pandas’ or ‘polars’.

Returns:
quantiles: float list in [0., 1.]. quantiles to estimate from y distribution .
Y_hat_df: DataFrame. With base quantile forecasts with columns ds and models to reconcile indexed by unique_id.*