The HierarchicalForecast package contains utility functions to wrangle and visualize hierarchical series datasets. The aggregate function of the module allows you to create a hierarchy from categorical variables representing the structure levels, returning also the aggregation contraints matrix S\mathbf{S}.

In addition, HierarchicalForecast ensures compatibility of its reconciliation methods with other popular machine-learning libraries via its external forecast adapters that transform output base forecasts from external libraries into a compatible data frame format.

Aggregate Function


source

aggregate

 aggregate (df:pandas.core.frame.DataFrame, spec:List[List[str]],
            is_balanced:bool=False, sparse_s:bool=False)

Utils Aggregation Function. Aggregates bottom level series contained in the pandas DataFrame df according to levels defined in the spec list.

TypeDefaultDetails
dfDataFrameDataframe with columns ['ds', 'y'] and columns to aggregate.
specListList of levels. Each element of the list should contain a list of columns of df to aggregate.
is_balancedboolFalseDeprecated.
sparse_sboolFalseReturn S_df as a sparse dataframe.
Returnspandas DataFrameHierarchically structured series.

Hierarchical Visualization


source

HierarchicalPlot

 HierarchicalPlot (S:pandas.core.frame.DataFrame,
                   tags:Dict[str,numpy.ndarray])

*Hierarchical Plot

This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series.

Parameters:
S: pd.DataFrame with summing matrix of size (base, bottom), see aggregate function.
tags: np.ndarray, with hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.

*


source

plot_summing_matrix

 plot_summing_matrix ()

*Summation Constraints plot

This method simply plots the hierarchical aggregation constraints matrix S\mathbf{S}.*


source

plot_series

 plot_series (series:str, Y_df:Optional[pandas.core.frame.DataFrame]=None,
              models:Optional[List[str]]=None,
              level:Optional[List[int]]=None)

*Single Series plot

Parameters:
series: str, string identifying the 'unique_id' any-level series to plot.
Y_df: pd.DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'.
models: List[str], string identifying filtering model columns. level: float list 0-100, confidence levels for prediction intervals available in Y_df.

Returns:
Single series plot with filtered models and prediction interval level.

*


source

plot_hierarchically_linked_series

 plot_hierarchically_linked_series (bottom_series:str,
                                    Y_df:Optional[pandas.core.frame.DataFr
                                    ame]=None,
                                    models:Optional[List[str]]=None,
                                    level:Optional[List[int]]=None)

*Hierarchically Linked Series plot

Parameters:
bottom_series: str, string identifying the 'unique_id' bottom-level series to plot.
Y_df: pd.DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: List[str], string identifying filtering model columns. level: float list 0-100, confidence levels for prediction intervals available in Y_df.

Returns:
Collection of hierarchilly linked series plots associated with the bottom_series and filtered models and prediction interval level.

*


source

plot_hierarchical_predictions_gap

 plot_hierarchical_predictions_gap (Y_df:pandas.core.frame.DataFrame,
                                    models:Optional[List[str]]=None,
                                    xlabel:Optional=None,
                                    ylabel:Optional=None)

*Hierarchically Predictions Gap plot

Parameters:
Y_df: pd.DataFrame, hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
models: List[str], string identifying filtering model columns. xlabel: str, string for the plot’s x axis label. ylable: str, string for the plot’s y axis label.

Returns:
Plots of aggregated predictions at different levels of the hierarchical structure. The aggregation is performed according to the tag levels see aggregate function.

*

from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, ETS, Naive
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df  = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')

fcst = StatsForecast(
    df=Y_train_df, 
    #models=[AutoARIMA(season_length=12), Naive()], 
    models=[ETS(season_length=12, model='AAZ')],
    freq='MS', 
    n_jobs=-1
)
Y_hat_df = fcst.forecast(h=24)

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='ETS',
    xlabel='Month', ylabel='Predictions',
)

External Forecast Adapters


source

samples_to_quantiles_df

 samples_to_quantiles_df (samples:numpy.ndarray, unique_ids:Iterable[str],
                          dates:Iterable,
                          quantiles:Optional[Iterable[float]]=None,
                          level:Optional[Iterable[int]]=None,
                          model_name:Optional[str]='model')

*Transform Random Samples into HierarchicalForecast input. Auxiliary function to create compatible HierarchicalForecast input Y_hat_df dataframe.

Parameters:
samples: numpy array. Samples from forecast distribution of shape [n_series, n_samples, horizon].
unique_ids: string list. Unique identifiers for each time series.
dates: datetime list. List of forecast dates.
quantiles: float list in [0., 1.]. Alternative to level, quantiles to estimate from y distribution.
level: int list in [0,100]. Probability levels for prediction intervals.
model_name: string. Name of forecasting model.

Returns:
quantiles: float list in [0., 1.]. quantiles to estimate from y distribution .
Y_hat_df: pd.DataFrame. With base quantile forecasts with columns ds and models to reconcile indexed by unique_id.*