> ## Documentation Index
> Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Local vs Global Temporal Aggregation

> Temporal Hierarchical Aggregation on a local or global level.

In this notebook we explain the difference between temporally
aggregating timeseries locally and globally.

You can run these experiments using CPU or GPU with Google Colab.

<a href="https://colab.research.google.com/github/Nixtla/hierarchicalforecast/blob/main/nbs/examples/LocalGlobalAggregation.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" />
</a>

```python theme={null}
!pip install hierarchicalforecast utilsforecast
```

## 1. Generate Data

In this example we will generate synthetic series to explain the
difference between local- and global temporal aggregation. We will
generate 2 series with a daily frequency.

```python theme={null}
from utilsforecast.data import generate_series
```

```python theme={null}
freq = "D"
n_series = 2
df = generate_series(n_series=n_series, 
                     freq=freq, 
                     min_length=2 * 365, 
                     max_length=4 * 365,  
                     equal_ends=True)
```

Note that our two timeseries do not have the same number of timesteps:

```python theme={null}
df.groupby('unique_id', observed=True)["ds"].count()
```

```text theme={null}
unique_id
0    1414
1    1289
Name: ds, dtype: int64
```

We then define a spec for our temporal aggregations.

```python theme={null}
spec  = {"year": 365, "quarter": 91, "month": 30, "week": 7, "day": 1}
```

## 2. Local aggregation (default)

In local aggregation, we treat the timestamps of each timeseries
individually. It means that the temporal aggregation is performed by
only looking at the timestamps of each series, disregarding the
timestamps of other series.

```python theme={null}
from hierarchicalforecast.utils import aggregate_temporal
```

```python theme={null}
Y_df_local, S_df_local, tags_local = aggregate_temporal(df, spec)
```

We have created temporal aggregations *per timeseries*, as the temporal
aggregation `month-1` doesn’t correspond to the same (year, month) for
both timeseries. This is because the series with `unique_id=1` is
shorter and has its first datapoint in July 2000, in contrast to the
series with `unique_id=0`, which is longer and has its first timestamp
in March 2000.

```python theme={null}
Y_df_local.query("temporal_id == 'month-1'")
```

|    | temporal\_id | unique\_id | ds         | y         |
| -- | ------------ | ---------- | ---------- | --------- |
| 39 | month-1      | 0          | 2000-03-16 | 93.574676 |
| 87 | month-1      | 1          | 2000-07-19 | 91.506421 |

## 2. Global aggregation

In global aggregation, we examine all unique timestamps across all
timeseries, and base our temporal aggregations on the unique list of
timestamps across all timeseries. We can specify the aggregation type by
setting the `aggregation_type` attritbue in `aggregate_temporal`.

```python theme={null}
Y_df_global, S_df_global, tags_globval = aggregate_temporal(df, spec, aggregation_type="global")

```

We have created temporal aggregations *across all timeseries*, as the
temporal aggregation `month-1` corresponds to the same (year,
month)-combination for both timeseries. Since `month-1` isn’t present in
the second timeseries (as it is shorter), we have only one record for
the aggregation.

```python theme={null}
Y_df_global.query("temporal_id == 'month-1'")
```

|    | temporal\_id | unique\_id | ds         | y         |
| -- | ------------ | ---------- | ---------- | --------- |
| 39 | month-1      | 0          | 2000-03-16 | 93.574676 |

For `month-5` however, we have a record for both timeseries, as the
second series has its first datapoint in that month.

```python theme={null}
Y_df_global.query("temporal_id == 'month-5'")
```

|    | temporal\_id | unique\_id | ds         | y         |
| -- | ------------ | ---------- | ---------- | --------- |
| 43 | month-5      | 0          | 2000-07-14 | 95.169659 |
| 87 | month-5      | 1          | 2000-07-14 | 74.502584 |

Hence, the global aggregation ensures temporal alignment across all
series.

## 3. What to choose?

* If all timeseries have the same length and same timestamps, `global`
  and `local` yield the same results.
* The default behavior is `local`. This means that temporal
  aggregations between timeseries can’t be compared unless the series
  have the same length and timestamp. This behavior is generally
  safer, and advised to use when time series are not necessarily
  related, and you are building per-series models using
  e.g. `StatsForecast`.
* The `global` behavior can be useful when dealing with timeseries
  where we expect relationships between the timeseries. For example,
  in case of forecasting daily product demand individual products may
  not always have sales for all timesteps, but one is interested in
  the overall temporal yearly aggregation across all products. The
  `global` setting has more room for error, so be careful and check
  the aggregation result carefully. This would typically be the
  setting used in combination with models from `MLForecast` or
  `NeuralForecast`.
