In this notebook we present an example on how to use HierarchicalForecast to produce coherent forecasts between temporal levels. We will use the monthly and quarterly timeseries of the M3 dataset. We will first load the M3 data and produce base forecasts using an AutoETS model from StatsForecast. Then, we reconcile the forecasts with THIEF (Temporal HIerarchical Forecasting) from HierarchicalForecast according to a specified temporal hierarchy.

References

Athanasopoulos, G, Hyndman, Rob J., Kourentzes, N., Petropoulos, Fotios (2017). Forecasting with temporal hierarchies. European Journal of Operational Research, 262, 60-74

You can run these experiments using CPU or GPU with Google Colab.

!pip install hierarchicalforecast statsforecast datasetsforecast

1. Load and Process Data

import numpy as np
import pandas as pd
from datasetsforecast.m3 import M3
m3_monthly, _, _ = M3.load(directory='data', group='Monthly')
m3_quarterly, _, _ = M3.load(directory='data', group='Quarterly')

We will be making aggregations up to yearly levels, so for both monthly and quarterly data we make sure each time series has an integer multiple of bottom-level timesteps.

For example, the first time series in m3_monthly (with unique_id='M1') has 68 timesteps. This is not a multiple of 12 (12 months in one year), so we would not be able to aggregate all timesteps into full years. Hence, we truncate (remove) the first 8 timesteps, resulting in 60 timesteps for this series. We do something similar for the quarterly data, albeit with a multiple of 4 (4 quarters in one year).

Depending on the highest temporal aggregation in your reconciliation problem, you may want to truncate your data differently.

m3_monthly = m3_monthly.groupby("unique_id", group_keys=False)\
                       .apply(lambda x: x.tail(len(x) //  12 * 12))\
                       .reset_index(drop=True)

m3_quarterly = m3_quarterly.groupby("unique_id", group_keys=False)\
                           .apply(lambda x: x.tail(len(x) //  4 * 4))\
                           .reset_index(drop=True)

2. Temporal reconciliation

2a. Split Train/Test sets

We use as test samples the last 24 observations from the Monthly series and the last 8 observations of each quarterly series, following the original THIEF paper.

horizon_monthly = 24
horizon_quarterly = 8
m3_monthly_test = m3_monthly.groupby("unique_id", as_index=False).tail(horizon_monthly)
m3_monthly_train = m3_monthly.drop(m3_monthly_test.index)

m3_quarterly_test = m3_quarterly.groupby("unique_id", as_index=False).tail(horizon_quarterly)
m3_quarterly_train = m3_quarterly.drop(m3_quarterly_test.index)

2a. Aggregating the dataset according to temporal hierarchy

We first define the temporal aggregation spec. The spec is a dictionary in which the keys are the name of the aggregation and the value is the amount of bottom-level timesteps that should be aggregated in that aggregation. For example, year consists of 12 months, so we define a key, value pair "yearly":12. We can do something similar for other aggregations that we are interested in.

spec_temporal_monthly = {"yearly": 12, "semiannually": 6, "fourmonthly": 4, "quarterly": 3, "bimonthly": 2, "monthly": 1}
spec_temporal_quarterly = {"yearly": 4, "semiannually": 2, "quarterly": 1}

We next compute the temporally aggregated train- and test sets using the aggregate_temporal function. Note that we have different aggregation matrices S for the train- and test set, as the test set contains temporal hierarchies that are not included in the train set.

from hierarchicalforecast.utils import aggregate_temporal
# Monthly
Y_monthly_train, S_monthly_train, tags_monthly_train = aggregate_temporal(df=m3_monthly_train, spec=spec_temporal_monthly)
Y_monthly_test, S_monthly_test, tags_monthly_test = aggregate_temporal(df=m3_monthly_test, spec=spec_temporal_monthly)

# Quarterly
Y_quarterly_train, S_quarterly_train, tags_quarterly_train = aggregate_temporal(df=m3_quarterly_train, spec=spec_temporal_quarterly)
Y_quarterly_test, S_quarterly_test, tags_quarterly_test = aggregate_temporal(df=m3_quarterly_test,  spec=spec_temporal_quarterly)

Our aggregation matrices aggregate the lowest temporal granularity (quarters) up to years, for the train- and test set.

S_monthly_train.iloc[:5, :5]
temporal_idmonthly-1monthly-2monthly-3monthly-4
0yearly-10.00.00.00.0
1yearly-20.00.00.00.0
2yearly-30.00.00.00.0
3yearly-40.00.00.00.0
4yearly-50.00.00.00.0
S_monthly_test.iloc[:5, :5]
temporal_idmonthly-1monthly-2monthly-3monthly-4
0yearly-11.01.01.01.0
1yearly-20.00.00.00.0
2semiannually-11.01.01.01.0
3semiannually-20.00.00.00.0
4semiannually-30.00.00.00.0

2b. Computing base forecasts

Now, we need to compute base forecasts for each temporal aggregation. The following cell computes the base forecasts for each temporal aggregation in Y_monthly_train and Y_quarterly_train using the AutoARIMA model. Observe that Y_hats contains the forecasts but they are not coherent.

Note also that both frequency and horizon are different for each temporal aggregation. For the monthly data, the lowest level has a monthly frequency, and a horizon of 24 (constituting 2 years). However, as example, the year aggregation has a yearly frequency with a horizon of 2.

It is of course possible to choose a different model for each level in the temporal aggregation - you can be as creative as you like!

from statsforecast.models import AutoARIMA
from statsforecast.core import StatsForecast
c:\Users\ospra\miniconda3\envs\hierarchicalforecast\lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Y_hats = []
id_cols = ["unique_id", "temporal_id", "ds", "y"]

# We loop over the monthly and quarterly data
for tags_train, tags_test, Y_train, Y_test in zip([tags_monthly_train, tags_quarterly_train], 
                                                  [tags_monthly_test, tags_quarterly_test],
                                                  [Y_monthly_train, Y_quarterly_train], 
                                                  [Y_monthly_test, Y_quarterly_test]):
    # We will train a model for each temporal level
    Y_hats_tags = []
    for level, temporal_ids_train in tags_train.items():
        # Filter the data for the level
        Y_level_train = Y_train.query("temporal_id in @temporal_ids_train")
        temporal_ids_test = tags_test[level]
        Y_level_test = Y_test.query("temporal_id in @temporal_ids_test")
        # For each temporal level we have a different frequency and forecast horizon. We use the timestamps of the first timeseries to automatically derive the frequency & horizon of the temporally aggregated series.
        unique_id = Y_level_train["unique_id"].iloc[0]
        freq_level = pd.infer_freq(Y_level_train.query("unique_id == @unique_id")["ds"])
        horizon_level = Y_level_test.query("unique_id == @unique_id")["ds"].nunique()
        # Train a model and create forecasts
        fcst = StatsForecast(models=[AutoARIMA()], freq=freq_level, n_jobs=-1)
        Y_hat_level = fcst.forecast(df=Y_level_train[["ds", "unique_id", "y"]], h=horizon_level)
        # Add the test set to the forecast
        Y_hat_level = pd.concat([Y_level_test.reset_index(drop=True), Y_hat_level.drop(columns=["unique_id", "ds"])], axis=1)
        # Put cols in the right order (for readability)
        Y_hat_cols = id_cols + [col for col in Y_hat_level.columns if col not in id_cols]
        Y_hat_level = Y_hat_level[Y_hat_cols]
        # Append the forecast to the list
        Y_hats_tags.append(Y_hat_level)

    Y_hat_tag = pd.concat(Y_hats_tags, ignore_index=True)
    Y_hats.append(Y_hat_tag)

2c. Reconcile forecasts

We can use the HierarchicalReconciliation class to reconcile the forecasts. In this example we use BottomUp and MinTrace(wls_struct). The latter is the ‘structural scaling’ method introduced in Forecasting with temporal hierarchies.

Note that we have to set temporal=True in the reconcile function.

from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.core import HierarchicalReconciliation
reconcilers = [
    BottomUp(),
    MinTrace(method="wls_struct"),
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_recs = []
# We loop over the monthly and quarterly data
for Y_hat, S, tags in zip(Y_hats, 
                          [S_monthly_test, S_quarterly_test], 
                          [tags_monthly_test, tags_quarterly_test]):
    Y_rec = hrec.reconcile(Y_hat_df=Y_hat, S=S, tags=tags, temporal=True)
    Y_recs.append(Y_rec)

3. Evaluation

The HierarchicalForecast package includes the evaluate function to evaluate the different hierarchies.

We evaluate the temporally aggregated forecasts across all temporal aggregations.

from hierarchicalforecast.evaluation import evaluate
from utilsforecast.losses import mae

3a. Monthly

Y_rec_monthly = Y_recs[0]
evaluation = evaluate(df = Y_rec_monthly.drop(columns = 'unique_id'),
                      tags = tags_monthly_test,
                      metrics = [mae],
                      id_col='temporal_id',
                      benchmark="AutoARIMA")

evaluation.columns = ['level', 'metric', 'Base', 'BottomUp', 'MinTrace(wls_struct)']
numeric_cols = evaluation.select_dtypes(include="number").columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format).astype(np.float64)

evaluation
levelmetricBaseBottomUpMinTrace(wls_struct)
0yearlymae-scaled1.00.780.75
1semiannuallymae-scaled1.00.990.95
2fourmonthlymae-scaled1.00.960.93
3quarterlymae-scaled1.00.950.93
4bimonthlymae-scaled1.00.960.94
5monthlymae-scaled1.01.000.99
6Overallmae-scaled1.00.940.92

MinTrace(wls_struct) is the best overall method, scoring the lowest mae on all levels.

3b. Quarterly

Y_rec_quarterly = Y_recs[1]
evaluation = evaluate(df = Y_rec_quarterly.drop(columns = 'unique_id'),
                      tags = tags_quarterly_test,
                      metrics = [mae],
                      id_col='temporal_id',
                      benchmark="AutoARIMA")

evaluation.columns = ['level', 'metric', 'Base', 'BottomUp', 'MinTrace(wls_struct)']
numeric_cols = evaluation.select_dtypes(include="number").columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.2f}'.format).astype(np.float64)

evaluation
levelmetricBaseBottomUpMinTrace(wls_struct)
0yearlymae-scaled1.00.870.85
1semiannuallymae-scaled1.01.031.00
2quarterlymae-scaled1.01.000.97
3Overallmae-scaled1.00.970.94

Again, MinTrace(wls_struct) is the best overall method, scoring the lowest mae on all levels.