In this notebook we explain the difference between temporally aggregating timeseries locally and globally.

You can run these experiments using CPU or GPU with Google Colab.

!pip install hierarchicalforecast utilsforecast

1. Generate Data

In this example we will generate synthetic series to explain the difference between local- and global temporal aggregation. We will generate 2 series with a daily frequency.

from utilsforecast.data import generate_series
freq = "D"
n_series = 2
df = generate_series(n_series=n_series, 
                     freq=freq, 
                     min_length=2 * 365, 
                     max_length=4 * 365,  
                     equal_ends=True)

Note that our two timeseries do not have the same number of timesteps:

df.groupby('unique_id', observed=True)["ds"].count()
unique_id
0    1414
1    1289
Name: ds, dtype: int64

We then define a spec for our temporal aggregations.

spec  = {"year": 365, "quarter": 91, "month": 30, "week": 7, "day": 1}

2. Local aggregation (default)

In local aggregation, we treat the timestamps of each timeseries individually. It means that the temporal aggregation is performed by only looking at the timestamps of each series, disregarding the timestamps of other series.

from hierarchicalforecast.utils import aggregate_temporal
Y_df_local, S_df_local, tags_local = aggregate_temporal(df, spec)

We have created temporal aggregations per timeseries, as the temporal aggregation month-1 doesn’t correspond to the same (year, month) for both timeseries. This is because the series with unique_id=1 is shorter and has its first datapoint in July 2000, in contrast to the series with unique_id=0, which is longer and has its first timestamp in March 2000.

Y_df_local.query("temporal_id == 'month-1'")
temporal_idunique_iddsy
39month-102000-03-1693.574676
87month-112000-07-1991.506421

2. Global aggregation

In global aggregation, we examine all unique timestamps across all timeseries, and base our temporal aggregations on the unique list of timestamps across all timeseries. We can specify the aggregation type by setting the aggregation_type attritbue in aggregate_temporal.

Y_df_global, S_df_global, tags_globval = aggregate_temporal(df, spec, aggregation_type="global")

We have created temporal aggregations across all timeseries, as the temporal aggregation month-1 corresponds to the same (year, month)-combination for both timeseries. Since month-1 isn’t present in the second timeseries (as it is shorter), we have only one record for the aggregation.

Y_df_global.query("temporal_id == 'month-1'")
temporal_idunique_iddsy
39month-102000-03-1693.574676

For month-5 however, we have a record for both timeseries, as the second series has its first datapoint in that month.

Y_df_global.query("temporal_id == 'month-5'")
temporal_idunique_iddsy
43month-502000-07-1495.169659
87month-512000-07-1474.502584

Hence, the global aggregation ensures temporal alignment across all series.

3. What to choose?

  • If all timeseries have the same length and same timestamps, global and local yield the same results.
  • The default behavior is local. This means that temporal aggregations between timeseries can’t be compared unless the series have the same length and timestamp. This behavior is generally safer, and advised to use when time series are not necessarily related, and you are building per-series models using e.g. StatsForecast.
  • The global behavior can be useful when dealing with timeseries where we expect relationships between the timeseries. For example, in case of forecasting daily product demand individual products may not always have sales for all timesteps, but one is interested in the overall temporal yearly aggregation across all products. The global setting has more room for error, so be careful and check the aggregation result carefully. This would typically be the setting used in combination with models from MLForecast or NeuralForecast.