Skip to main content

HierarchicalForecast contains pure Python implementations of hierarchical reconciliation methods as well as a core.HierarchicalReconciliation wrapper class that enables easy interaction with these methods through pandas DataFrames containing the hierarchical time series and the base predictions. The core.HierarchicalReconciliation reconciliation class operates with the hierarchical time series pd.DataFrame Y_df, the base predictions pd.DataFrame Y_hat_df, the aggregation constraints matrix S_df. For more information on the creation of aggregation constraints matrix see the utils aggregation method

HierarchicalReconciliation

HierarchicalReconciliation(reconcilers)
Hierarchical Reconciliation Class. The core.HierarchicalReconciliation class allows you to efficiently fit multiple HierarchicaForecast methods for a collection of time series and base predictions stored in pandas DataFrames. The Y_df dataframe identifies series and datestamps with the unique_id and ds columns while the y column denotes the target time series variable. The Y_h dataframe stores the base predictions, example (AutoARIMA, ETS, etc.). Parameters:
NameTypeDescriptionDefault
- reconcilerslist[HReconciler]A list of instantiated classes of the reconciliation methods module.required

HierarchicalReconciliation.reconcile

reconcile(Y_hat_df, tags, S_df=None, Y_df=None, level=None, intervals_method='normality', num_samples=-1, seed=0, is_balanced=False, id_col='unique_id', time_col='ds', target_col='y', id_time_col='temporal_id', temporal=False, S=None, diagnostics=False, diagnostics_atol=1e-06)
Hierarchical Reconciliation Method. The reconcile method is analogous to SKLearn fit_predict method, it applies different reconciliation techniques instantiated in the reconcilers list. Most reconciliation methods can be described by the following convenient linear algebra notation: y~[a,b],tau=S[a,b][b]P[b][a,b]y^[a,b],tau\tilde{\mathbf{y}}_{[a,b],\\tau} = \mathbf{S}_{[a,b][b]} \mathbf{P}_{[b][a,b]} \hat{\mathbf{y}}_{[a,b],\\tau} where a,ba, b represent the aggregate and bottom levels, mathbfS[a,b][b]\\mathbf{S}_{[a,b][b]} contains the hierarchical aggregation constraints, and mathbfP[b][a,b]\\mathbf{P}_{[b][a,b]} varies across reconciliation methods. The reconciled predictions are y~[a,b],τ\tilde{\mathbf{y}}_{[a,b],\tau} and the base predictions y^[a,b],τ\hat{\mathbf{y}}_{[a,b],\tau} Parameters:
NameTypeDescriptionDefault
Y_hat_dfFrameDataFrame, base forecasts with columns [‘unique_id’, ‘ds’] and models to reconcile.required
tagsdict[str, ndarray]Each key is a level and its value contains tags associated to that level.required
S_dfFrameDataFrame with summing matrix of size (base, bottom), see aggregate method. Default is None.None
Y_dfOptional[Frame]DataFrame, training set of base time series with columns ['unique_id', 'ds', 'y']. If a class of self.reconciles receives y_hat_insample, Y_df must include them as columns. Default is None.None
levelOptional[list[int]]positive float list [0,100), confidence levels for prediction intervals. Default is None.None
intervals_methodstrmethod used to calculate prediction intervals, one of normality, bootstrap, permbu. Default is “normality”.‘normality’
num_samplesintif positive return that many probabilistic coherent samples. Default is -1.-1
seedintrandom seed for numpy generator’s replicability. Default is 0.0
is_balancedboolwether Y_df is balanced, set it to True to speed things up if Y_df is balanced. Default is False.False
id_colstrcolumn that identifies each serie. Default is “unique_id”.‘unique_id’
time_colstrcolumn that identifies each timestep, its values can be timestamps or integers. Default is “ds”.‘ds’
target_colstrcolumn that contains the target. Default is “y”.‘y’
diagnosticsboolif True, compute coherence diagnostics and store in self.diagnostics. Default is False.False
diagnostics_atolfloatabsolute tolerance for numerical coherence check. Default is 1e-6.1e-06
Returns:
TypeDescription
FrameTDataFrame, with reconciled predictions.

HierarchicalReconciliation.bootstrap_reconcile

bootstrap_reconcile(Y_hat_df, S_df, tags, Y_df=None, level=None, intervals_method='normality', num_samples=-1, num_seeds=1, id_col='unique_id', time_col='ds', target_col='y')
Bootstraped Hierarchical Reconciliation Method. Applies N times, based on different random seeds, the reconcile method for the different reconciliation techniques instantiated in the reconcilers list. Parameters:
NameTypeDescriptionDefault
Y_hat_dfFrameDataFrame, base forecasts with columns [‘unique_id’, ‘ds’] and models to reconcile.required
S_dfFrameDataFrame with summing matrix of size (base, bottom), see aggregate method.required
tagsdict[str, ndarray]Each key is a level and its value contains tags associated to that level.required
Y_dfOptional[Frame]DataFrame, training set of base time series with columns ['unique_id', 'ds', 'y']. If a class of self.reconciles receives y_hat_insample, Y_df must include them as columns. Default is None.None
levelOptional[list[int]]positive float list [0,100), confidence levels for prediction intervals. Default is None.None
intervals_methodstrmethod used to calculate prediction intervals, one of normality, bootstrap, permbu. Default is “normality”.‘normality’
num_samplesintif positive return that many probabilistic coherent samples. Default is -1.-1
num_seedsintrandom seed for numpy generator’s replicability. Default is 1.1
id_colstrcolumn that identifies each serie. Default is “unique_id”.‘unique_id’
time_colstrcolumn that identifies each timestep, its values can be timestamps or integers. Default is “ds”.‘ds’
target_colstrcolumn that contains the target. Default is “y”.‘y’
Returns:
TypeDescription
FrameTDataFrame, with bootstraped reconciled predictions.

Example

import pandas as pd

from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.utils import aggregate
from hierarchicalforecast.evaluation import evaluate
from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from utilsforecast.losses import mase, rmse
from functools import partial

# Load TourismSmall dataset
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
df = df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
df.insert(0, 'Country', 'Australia')
qs = df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
df['ds'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()

# Create hierarchical seires based on geographic levels and purpose
# And Convert quarterly ds string to pd.datetime format
hierarchy_levels = [['Country'],
                    ['Country', 'State'],
                    ['Country', 'Purpose'],
                    ['Country', 'State', 'Region'],
                    ['Country', 'State', 'Purpose'],
                    ['Country', 'State', 'Region', 'Purpose']]

Y_df, S_df, tags = aggregate(df=df, spec=hierarchy_levels)

# Split train/test sets
Y_test_df  = Y_df.groupby('unique_id').tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)

# Compute base auto-ETS predictions
# Careful identifying correct data freq, this data quarterly 'Q'
fcst = StatsForecast(models=[AutoETS(season_length=4, model='ZZA')], freq='QS', n_jobs=-1)
Y_hat_df = fcst.forecast(df=Y_train_df, h=8, fitted=True)
Y_fitted_df = fcst.forecast_fitted_values()

reconcilers = [
                BottomUp(),
                MinTrace(method='ols'),
                MinTrace(method='mint_shrink'),
               ]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df,
                          Y_df=Y_fitted_df,
                          S_df=S_df, tags=tags)

# Evaluate
eval_tags = {}
eval_tags['Total'] = tags['Country']
eval_tags['Purpose'] = tags['Country/Purpose']
eval_tags['State'] = tags['Country/State']
eval_tags['Regions'] = tags['Country/State/Region']
eval_tags['Bottom'] = tags['Country/State/Region/Purpose']

Y_rec_df_with_y = Y_rec_df.merge(Y_test_df, on=['unique_id', 'ds'], how='left')
mase_p = partial(mase, seasonality=4)

evaluation = evaluate(Y_rec_df_with_y,
         metrics=[mase_p, rmse],
         tags=eval_tags,
         train_df=Y_train_df)

numeric_cols = evaluation.select_dtypes(include="number").columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format)