The HierarchicalForecast package contains utility functions to wrangle and visualize hierarchical series datasets. The aggregate function of the module allows you to create a hierarchy from categorical variables representing the structure levels, returning also the aggregation contraints matrix S\mathbf{S}.

In addition, HierarchicalForecast ensures compatibility of its reconciliation methods with other popular machine-learning libraries via its external forecast adapters that transform output base forecasts from external libraries into a compatible data frame format.

Aggregate Function


source

aggregate

 aggregate (df:pandas.core.frame.DataFrame, spec:List[List[str]],
            is_balanced:bool=False, sparse_s:bool=False)

Utils Aggregation Function. Aggregates bottom level series contained in the pandas DataFrame df according to levels defined in the spec list.

TypeDefaultDetails
dfDataFrameDataframe with columns ['ds', 'y'] and columns to aggregate.
specListList of levels. Each element of the list should contain a list of columns of df to aggregate.
is_balancedboolFalseDeprecated.
sparse_sboolFalseReturn S_df as a sparse dataframe.
Returnspandas DataFrameHierarchically structured series.

Hierarchical Visualization


source

HierarchicalPlot

 HierarchicalPlot (S:pandas.core.frame.DataFrame,
                   tags:Dict[str,numpy.ndarray])

Hierarchical Plot

This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series.

Parameters:
S: pd.DataFrame with summing matrix of size (base, bottom), see aggregate function.
tags: np.ndarray, with hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.


source

plot_summing_matrix

 plot_summing_matrix ()

Summation Constraints plot

This method simply plots the hierarchical aggregation constraints matrix S\mathbf{S}.


source

plot_series

 plot_series (series:str, Y_df:Optional[pandas.core.frame.DataFrame]=None,
              models:Optional[List[str]]=None,
              level:Optional[List[int]]=None)

Single Series plot

Parameters:
series: str, string identifying the 'unique_id' any-level series to plot.
Y_df: pd.DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'.
models: List[str], string identifying filtering model columns. level: float list 0-100, confidence levels for prediction intervals available in Y_df.

Returns:
Single series plot with filtered models and prediction interval level.


source

plot_hierarchically_linked_series

 plot_hierarchically_linked_series (bottom_series:str,
                                    Y_df:Optional[pandas.core.frame.DataFr
                                    ame]=None,
                                    models:Optional[List[str]]=None,
                                    level:Optional[List[int]]=None)

Hierarchically Linked Series plot

Parameters:
bottom_series: str, string identifying the 'unique_id' bottom-level series to plot.
Y_df: pd.DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: List[str], string identifying filtering model columns. level: float list 0-100, confidence levels for prediction intervals available in Y_df.

Returns:
Collection of hierarchilly linked series plots associated with the bottom_series and filtered models and prediction interval level.


source

plot_hierarchical_predictions_gap

 plot_hierarchical_predictions_gap (Y_df:pandas.core.frame.DataFrame,
                                    models:Optional[List[str]]=None,
                                    xlabel:Optional=None,
                                    ylabel:Optional=None)

Hierarchically Predictions Gap plot

Parameters:
Y_df: pd.DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: List[str], string identifying filtering model columns. xlabel: str, string for the plot’s x axis label. ylable: str, string for the plot’s y axis label.

Returns:
Plots of aggregated predictions at different levels of the hierarchical structure. The aggregation is performed according to the tag levels see aggregate function.

from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, ETS, Naive
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df  = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')

fcst = StatsForecast(
    df=Y_train_df, 
    #models=[AutoARIMA(season_length=12), Naive()], 
    models=[ETS(season_length=12, model='AAZ')],
    freq='MS', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(h=24)

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='ETS',
    xlabel='Month', ylabel='Predictions',
)

External Forecast Adapters


source

samples_to_quantiles_df

 samples_to_quantiles_df (samples:numpy.ndarray, unique_ids:Iterable[str],
                          dates:Iterable,
                          quantiles:Optional[Iterable[float]]=None,
                          level:Optional[Iterable[int]]=None,
                          model_name:Optional[str]='model')

Transform Random Samples into HierarchicalForecast input. Auxiliary function to create compatible HierarchicalForecast input Y_hat_df dataframe.

Parameters:
samples: numpy array. Samples from forecast distribution of shape [n_series, n_samples, horizon].
unique_ids: string list. Unique identifiers for each time series.
dates: datetime list. List of forecast dates.
quantiles: float list in [0., 1.]. Alternative to level, quantiles to estimate from y distribution.
level: int list in [0,100]. Probability levels for prediction intervals.
model_name: string. Name of forecasting model.

Returns:
quantiles: float list in [0., 1.]. quantiles to estimate from y distribution .
Y_hat_df: pd.DataFrame. With base quantile forecasts with columns ds and models to reconcile indexed by unique_id.