Temporal Aggregation with THIEF
Temporal Hierarchical Forecasting on M3 monthly and quarterly data with THIEF
In this notebook we present an example on how to use
HierarchicalForecast
to produce coherent forecasts between temporal
levels. We will use the monthly and quarterly timeseries of the M3
dataset. We will first load the M3
data and produce base forecasts
using an AutoETS
model from StatsForecast
. Then, we reconcile the
forecasts with THIEF
(Temporal HIerarchical Forecasting) from
HierarchicalForecast
according to a specified temporal hierarchy.
References
You can run these experiments using CPU or GPU with Google Colab.
1. Load and Process Data
We will be making aggregations up to yearly levels, so for both monthly and quarterly data we make sure each time series has an integer multiple of bottom-level timesteps.
For example, the first time series in m3_monthly (with unique_id='M1'
)
has 68 timesteps. This is not a multiple of 12 (12 months in one year),
so we would not be able to aggregate all timesteps into full years.
Hence, we truncate (remove) the first 8 timesteps, resulting in 60
timesteps for this series. We do something similar for the quarterly
data, albeit with a multiple of 4 (4 quarters in one year).
Depending on the highest temporal aggregation in your reconciliation problem, you may want to truncate your data differently.
2. Temporal reconciliation
2a. Split Train/Test sets
We use as test samples the last 24 observations from the Monthly series and the last 8 observations of each quarterly series, following the original THIEF paper.
2a. Aggregating the dataset according to temporal hierarchy
We first define the temporal aggregation spec. The spec is a dictionary
in which the keys are the name of the aggregation and the value is the
amount of bottom-level timesteps that should be aggregated in that
aggregation. For example, year
consists of 12
months, so we define a
key, value pair "yearly":12
. We can do something similar for other
aggregations that we are interested in.
We next compute the temporally aggregated train- and test sets using the
aggregate_temporal
function. Note that we have different aggregation matrices S
for the
train- and test set, as the test set contains temporal hierarchies that
are not included in the train set.
Our aggregation matrices aggregate the lowest temporal granularity (quarters) up to years, for the train- and test set.
temporal_id | monthly-1 | monthly-2 | monthly-3 | monthly-4 | |
---|---|---|---|---|---|
0 | yearly-1 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | yearly-2 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | yearly-3 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | yearly-4 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | yearly-5 | 0.0 | 0.0 | 0.0 | 0.0 |
temporal_id | monthly-1 | monthly-2 | monthly-3 | monthly-4 | |
---|---|---|---|---|---|
0 | yearly-1 | 1.0 | 1.0 | 1.0 | 1.0 |
1 | yearly-2 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | semiannually-1 | 1.0 | 1.0 | 1.0 | 1.0 |
3 | semiannually-2 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | semiannually-3 | 0.0 | 0.0 | 0.0 | 0.0 |
2b. Computing base forecasts
Now, we need to compute base forecasts for each temporal aggregation.
The following cell computes the base forecasts for each temporal
aggregation in Y_monthly_train
and Y_quarterly_train
using the
AutoARIMA
model. Observe that Y_hats
contains the forecasts but they
are not coherent.
Note also that both frequency and horizon are different for each
temporal aggregation. For the monthly data, the lowest level has a
monthly frequency, and a horizon of 24
(constituting 2 years).
However, as example, the year
aggregation has a yearly frequency with
a horizon of 2.
It is of course possible to choose a different model for each level in the temporal aggregation - you can be as creative as you like!
2c. Reconcile forecasts
We can use the
HierarchicalReconciliation
class to reconcile the forecasts. In this example we use
BottomUp
and MinTrace(wls_struct)
. The latter is the ‘structural scaling’
method introduced in Forecasting with temporal
hierarchies.
Note that we have to set temporal=True
in the reconcile
function.
3. Evaluation
The HierarchicalForecast
package includes the
evaluate
function to evaluate the different hierarchies.
We evaluate the temporally aggregated forecasts across all temporal aggregations.
3a. Monthly
level | metric | Base | BottomUp | MinTrace(wls_struct) | |
---|---|---|---|---|---|
0 | yearly | mae-scaled | 1.0 | 0.78 | 0.75 |
1 | semiannually | mae-scaled | 1.0 | 0.99 | 0.95 |
2 | fourmonthly | mae-scaled | 1.0 | 0.96 | 0.93 |
3 | quarterly | mae-scaled | 1.0 | 0.95 | 0.93 |
4 | bimonthly | mae-scaled | 1.0 | 0.96 | 0.94 |
5 | monthly | mae-scaled | 1.0 | 1.00 | 0.99 |
6 | Overall | mae-scaled | 1.0 | 0.94 | 0.92 |
MinTrace(wls_struct)
is the best overall method, scoring the lowest
mae
on all levels.
3b. Quarterly
level | metric | Base | BottomUp | MinTrace(wls_struct) | |
---|---|---|---|---|---|
0 | yearly | mae-scaled | 1.0 | 0.87 | 0.85 |
1 | semiannually | mae-scaled | 1.0 | 1.03 | 1.00 |
2 | quarterly | mae-scaled | 1.0 | 1.00 | 0.97 |
3 | Overall | mae-scaled | 1.0 | 0.97 | 0.94 |
Again, MinTrace(wls_struct)
is the best overall method, scoring the
lowest mae
on all levels.