Open In Colab

In many cases, only the time series at the lowest level of the hierarchies (bottom time series) are available. HierarchicalForecast has tools to create time series for all hierarchies and also allows you to calculate prediction intervals for all hierarchies. In this notebook we will see how to do it.

!pip install hierarchicalforecast statsforecast
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# compute base forecast no coherent
from statsforecast.models import AutoARIMA
from statsforecast.core import StatsForecast

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.methods import BottomUp, MinTrace
from hierarchicalforecast.utils import aggregate, HierarchicalPlot
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.evaluation import HierarchicalEvaluation
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/statsforecast/core.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from tqdm.autonotebook import tqdm

Aggregate bottom time series

In this example we will use the Tourism dataset from the Forecasting: Principles and Practice book. The dataset only contains the time series at the lowest level, so we need to create the time series for all hierarchies.

Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv')
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df.head()
CountryRegionStatePurposedsy
0AustraliaAdelaideSouth AustraliaBusiness1998-01-01135.077690
1AustraliaAdelaideSouth AustraliaBusiness1998-04-01109.987316
2AustraliaAdelaideSouth AustraliaBusiness1998-07-01166.034687
3AustraliaAdelaideSouth AustraliaBusiness1998-10-01127.160464
4AustraliaAdelaideSouth AustraliaBusiness1999-01-01137.448533

The dataset can be grouped in the following strictly hierarchical structure.

spec = [
    ['Country'],
    ['Country', 'State'], 
    ['Country', 'State', 'Region']
]

Using the aggregate function from HierarchicalForecast we can get the full set of time series.

Y_df, S_df, tags = aggregate(df=Y_df, spec=spec)
Y_df = Y_df.reset_index()
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
Y_df.head()
unique_iddsy
0Australia1998-01-0123182.197269
1Australia1998-04-0120323.380067
2Australia1998-07-0119826.640511
3Australia1998-10-0120830.129891
4Australia1999-01-0122087.353380
S_df.iloc[:5, :5]
Australia/ACT/CanberraAustralia/New South Wales/Blue MountainsAustralia/New South Wales/Capital CountryAustralia/New South Wales/Central CoastAustralia/New South Wales/Central NSW
Australia1.01.01.01.01.0
Australia/ACT1.00.00.00.00.0
Australia/New South Wales0.01.01.01.01.0
Australia/Northern Territory0.00.00.00.00.0
Australia/Queensland0.00.00.00.00.0
tags['Country/State']
array(['Australia/ACT', 'Australia/New South Wales',
       'Australia/Northern Territory', 'Australia/Queensland',
       'Australia/South Australia', 'Australia/Tasmania',
       'Australia/Victoria', 'Australia/Western Australia'], dtype=object)

We can visualize the S matrix and the data using the HierarchicalPlot class as follows.

hplot = HierarchicalPlot(S=S_df, tags=tags)
hplot.plot_summing_matrix()

hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra',
    Y_df=Y_df.set_index('unique_id')
)

Split Train/Test sets

We use the final two years (8 quarters) as test set.

Y_test_df = Y_df.groupby('unique_id').tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')
Y_train_df.groupby('unique_id').size()
unique_id
Australia                                                 72
Australia/ACT                                             72
Australia/ACT/Canberra                                    72
Australia/New South Wales                                 72
Australia/New South Wales/Blue Mountains                  72
                                                          ..
Australia/Western Australia/Australia's Coral Coast       72
Australia/Western Australia/Australia's Golden Outback    72
Australia/Western Australia/Australia's North West        72
Australia/Western Australia/Australia's South West        72
Australia/Western Australia/Experience Perth              72
Length: 85, dtype: int64

Computing base forecasts

The following cell computes the base forecasts for each time series in Y_df using the AutoARIMA and model. Observe that Y_hat_df contains the forecasts but they are not coherent. To reconcile the prediction intervals we need to calculate the uncoherent intervals using the level argument of StatsForecast.

fcst = StatsForecast(df=Y_train_df,
                     models=[AutoARIMA(season_length=4)], 
                     freq='QS', n_jobs=-1)
Y_hat_df = fcst.forecast(h=8, fitted=True, level=[80, 90])
Y_fitted_df = fcst.forecast_fitted_values()

Reconcile forecasts and compute prediction intervals using PERMBU

The following cell makes the previous forecasts coherent using the HierarchicalReconciliation class. In this example we use BottomUp and MinTrace. If you want to calculate prediction intervals, you have to use the level argument as follows and also intervals_method='permbu'.

reconcilers = [
    BottomUp(),
    MinTrace(method='mint_shrink'),
    MinTrace(method='ols')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_fitted_df,
                          S=S_df, tags=tags,
                          level=[80, 90], intervals_method='permbu')
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(
/Users/fedex/miniconda3/envs/hierarchicalforecast/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:828: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(

The dataframe Y_rec_df contains the reconciled forecasts.

Y_rec_df.head()
dsAutoARIMAAutoARIMA-lo-90AutoARIMA-lo-80AutoARIMA-hi-80AutoARIMA-hi-90AutoARIMA/BottomUpAutoARIMA/BottomUp-lo-90AutoARIMA/BottomUp-lo-80AutoARIMA/BottomUp-hi-80AutoARIMA/MinTrace_method-mint_shrinkAutoARIMA/MinTrace_method-mint_shrink-lo-90AutoARIMA/MinTrace_method-mint_shrink-lo-80AutoARIMA/MinTrace_method-mint_shrink-hi-80AutoARIMA/MinTrace_method-mint_shrink-hi-90AutoARIMA/MinTrace_method-olsAutoARIMA/MinTrace_method-ols-lo-90AutoARIMA/MinTrace_method-ols-lo-80AutoARIMA/MinTrace_method-ols-hi-80AutoARIMA/MinTrace_method-ols-hi-90
unique_id
Australia2016-01-0126212.55468824694.22460925029.58007827395.52734427730.88476624865.63671924106.80251024373.96204325423.56645025395.41192824733.04663324824.27468125939.34500725998.69246026133.75895325516.48451825600.64492626662.92320426855.585562
Australia2016-04-0125033.66796923324.06640623701.66992226365.66601626743.26953123247.09765622696.59793022821.25635723830.63256723986.27254023289.81143223525.63078524563.99864524739.82623824934.26039924402.57090424481.96856025567.08556525696.312229
Australia2016-07-0124507.02734422625.50000023041.07617225972.97851626388.55468822658.20703121816.98890622011.90503523158.31161923345.82118422688.60557422780.60257423934.24460924033.90622024374.02656923539.67372423797.83665124893.46309025098.321828
Australia2016-10-0125598.92968823559.91992224010.28125027187.57812527637.93750023330.80468822567.94829922694.44970823850.06816224275.42342023392.04044723626.16566224828.20781324958.92600225477.95191324793.43699324911.27154726006.24416126152.362329
Australia2017-01-0126982.57812524651.53515625166.39648428798.75781229313.61914124497.00195323578.41850323731.65743725114.56401725485.43387924549.96962524802.81969126073.79287226284.82638626842.56474126037.72556126248.17183127436.22276127668.067666

Plot forecasts

Then we can plot the probabilist forecasts using the following function.

plot_df = pd.concat([Y_df.set_index(['unique_id', 'ds']), 
                     Y_rec_df.set_index('ds', append=True)], axis=1)
plot_df = plot_df.reset_index('ds')

Plot single time series

hplot.plot_series(
    series='Australia',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA', 
            'AutoARIMA/MinTrace_method-ols',
            'AutoARIMA/BottomUp'
           ],
    level=[80]
)

Plot hierarchichally linked time series

hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/Western Australia/Experience Perth',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA', 'AutoARIMA/MinTrace_method-ols', 'AutoARIMA/BottomUp'],
    level=[80]
)

# ACT only has Canberra
hplot.plot_hierarchically_linked_series(
    bottom_series='Australia/ACT/Canberra',
    Y_df=plot_df, 
    models=['y', 'AutoARIMA/MinTrace_method-mint_shrink'],
    level=[80, 90]
)

References