> ## Documentation Index
> Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Aggregation/Visualization Utils

The `HierarchicalForecast` package contains utility functions to wrangle
and visualize hierarchical series datasets. The
[`aggregate`](https://nixtlaverse.nixtla.io/hierarchicalforecast/utils.html#aggregate)
function of the module allows you to create a hierarchy from categorical
variables representing the structure levels, returning also the
aggregation contraints matrix $\mathbf{S}$.

In addition, `HierarchicalForecast` ensures compatibility of its
reconciliation methods with other popular machine-learning libraries via
its external forecast adapters that transform output base forecasts from
external libraries into a compatible data frame format.

## Aggregate Function

### `aggregate`

```python theme={null}
aggregate(df, spec, exog_vars=None, sparse_s=False, id_col='unique_id', time_col='ds', id_time_col=None, target_cols=('y',))
```

Utils Aggregation Function.

Aggregates bottom level series contained in the DataFrame `df` according
to levels defined in the `spec` list.

**Parameters:**

| Name          | Type                                                                                                                         | Description                                                                                                                                                                                                                                                                                                                                 | Default                   |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------- |
| `df`          | <code>[Frame](#narwhals.typing.Frame)</code>                                                                                 | Dataframe with columns `[time_col, *target_cols]`, columns to aggregate and optionally exog\_vars.                                                                                                                                                                                                                                          | *required*                |
| `spec`        | <code>[list](#list)\[[list](#list)\[[str](#str)]]</code>                                                                     | list of levels. Each element of the list should contain a list of columns of `df` to aggregate.                                                                                                                                                                                                                                             | *required*                |
| `exog_vars`   | <code>[Optional](#Optional)\[[dict](#dict)\[[str](#str), [Union](#Union)\[[str](#str), [list](#list)\[[str](#str)]]]]</code> | dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None. | <code>None</code>         |
| `sparse_s`    | <code>[bool](#bool)</code>                                                                                                   | Return `S_df` as an `SMatrix` (sparse summing matrix wrapper) instead of a dense DataFrame. Works with both Pandas and Polars inputs. Default is False.                                                                                                                                                                                     | <code>False</code>        |
| `id_col`      | <code>[str](#str)</code>                                                                                                     | Column that will identify each serie after aggregation. Default is "unique\_id".                                                                                                                                                                                                                                                            | <code>'unique\_id'</code> |
| `time_col`    | <code>[str](#str)</code>                                                                                                     | Column that identifies each timestep, its values can be timestamps or integers. Default is "ds".                                                                                                                                                                                                                                            | <code>'ds'</code>         |
| `id_time_col` | <code>[Optional](#Optional)\[[str](#str)]</code>                                                                             | Column that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally. Default is None.                                                                                                                                                                                                        | <code>None</code>         |
| `target_cols` | <code>[Sequence](#collections.abc.Sequence)\[[str](#str)]</code>                                                             | list of columns that contains the targets to aggregate. Default is ("y",).                                                                                                                                                                                                                                                                  | <code>('y',)</code>       |

**Returns:**

| Type                                                                                                                                                                 | Description           |                                                                                                                                                                                                                |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <code>[tuple](#tuple)\[[FrameT](#narwhals.typing.FrameT), [FrameT](#narwhals.typing.FrameT) \| [SMatrix](#hierarchicalforecast.utils.SMatrix), [dict](#dict)]</code> | tuple\[FrameT, FrameT | SMatrix, dict]: Y\_df, S\_df, tags Y\_df: Hierarchically structured series. S\_df: Summing dataframe. When `sparse_s=True`, returns an     :class:`SMatrix` instead of a DataFrame. tags: Aggregation indices. |

### `aggregate_temporal`

```python theme={null}
aggregate_temporal(df, spec, exog_vars=None, sparse_s=False, id_col='unique_id', time_col='ds', id_time_col='temporal_id', target_cols=('y',), aggregation_type='local')
```

Utils Aggregation Function for Temporal aggregations.

Aggregates bottom level timesteps contained in the DataFrame `df` according
to temporal levels defined in the `spec` list.

**Parameters:**

| Name               | Type                                                                                                                         | Description                                                                                                                                                                                                                                                                                                                                 | Default                     |
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------- |
| `df`               | <code>[Frame](#narwhals.typing.Frame)</code>                                                                                 | Dataframe with columns `[time_col, target_cols]` and columns to aggregate.                                                                                                                                                                                                                                                                  | *required*                  |
| `spec`             | <code>[dict](#dict)\[[str](#str), [int](#int)]</code>                                                                        | Dictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation.                                                                                                                                                                                   | *required*                  |
| `exog_vars`        | <code>[Optional](#Optional)\[[dict](#dict)\[[str](#str), [Union](#Union)\[[str](#str), [list](#list)\[[str](#str)]]]]</code> | dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None. | <code>None</code>           |
| `sparse_s`         | <code>[bool](#bool)</code>                                                                                                   | Return `S_df` as an `SMatrix` (sparse summing matrix wrapper) instead of a dense DataFrame. Works with both Pandas and Polars inputs. Default is False.                                                                                                                                                                                     | <code>False</code>          |
| `id_col`           | <code>[str](#str)</code>                                                                                                     | Column that will identify each serie after aggregation. Default is 'unique\_id'.                                                                                                                                                                                                                                                            | <code>'unique\_id'</code>   |
| `time_col`         | <code>[str](#str)</code>                                                                                                     | Column that identifies each timestep, its values can be timestamps or integers. Default is 'ds'.                                                                                                                                                                                                                                            | <code>'ds'</code>           |
| `id_time_col`      | <code>[str](#str)</code>                                                                                                     | Column that will identify each timestep after aggregation. Default is 'temporal\_id'.                                                                                                                                                                                                                                                       | <code>'temporal\_id'</code> |
| `target_cols`      | <code>[Sequence](#collections.abc.Sequence)\[[str](#str)]</code>                                                             | List of columns that contain the targets to aggregate. Default is ('y',).                                                                                                                                                                                                                                                                   | <code>('y',)</code>         |
| `aggregation_type` | <code>[str](#str)</code>                                                                                                     | If 'local' the aggregation will be performed on the timestamps of each timeseries independently. If 'global' the aggregation will be performed on the unique timestamps of all timeseries. Default is 'local'.                                                                                                                              | <code>'local'</code>        |

**Returns:**

| Type                                                                                                               | Description                                                                                                                                                                 |
| ------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| <code>[tuple](#tuple)\[[FrameT](#narwhals.typing.FrameT), [FrameT](#narwhals.typing.FrameT), [dict](#dict)]</code> | tuple\[FrameT, FrameT, dict]: Y\_df, S\_df, tags Y\_df: Temporally hierarchically structured series. S\_df: Temporal summing dataframe. tags: Temporal aggregation indices. |

### `make_future_dataframe`

```python theme={null}
make_future_dataframe(df, freq, h, id_col='unique_id', time_col='ds')
```

Create future dataframe for forecasting.

**Parameters:**

| Name       | Type                                                    | Description                                                                                      | Default                   |
| ---------- | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | ------------------------- |
| `df`       | <code>[Frame](#narwhals.typing.Frame)</code>            | Dataframe with ids, times and values for the exogenous regressors.                               | *required*                |
| `freq`     | <code>[Union](#Union)\[[str](#str), [int](#int)]</code> | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer.             | *required*                |
| `h`        | <code>[int](#int)</code>                                | Forecast horizon.                                                                                | *required*                |
| `id_col`   | <code>[str](#str)</code>                                | Column that identifies each serie. Default is 'unique\_id'.                                      | <code>'unique\_id'</code> |
| `time_col` | <code>[str](#str)</code>                                | Column that identifies each timestep, its values can be timestamps or integers. Default is 'ds'. | <code>'ds'</code>         |

**Returns:**

| Name     | Type                                           | Description                   |
| -------- | ---------------------------------------------- | ----------------------------- |
| `FrameT` | <code>[FrameT](#narwhals.typing.FrameT)</code> | DataFrame with future values. |

### `get_cross_temporal_tags`

```python theme={null}
get_cross_temporal_tags(df, tags_cs, tags_te, sep='//', id_col='unique_id', id_time_col='temporal_id', cross_temporal_id_col='cross_temporal_id')
```

Get cross-temporal tags.

**Parameters:**

| Name                    | Type                                                                | Description                                                                                  | Default                            |
| ----------------------- | ------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | ---------------------------------- |
| `df`                    | <code>[Frame](#narwhals.typing.Frame)</code>                        | DataFrame with temporal ids.                                                                 | *required*                         |
| `tags_cs`               | <code>[dict](#dict)\[[str](#str), [ndarray](#numpy.ndarray)]</code> | Tags for the cross-sectional hierarchies.                                                    | *required*                         |
| `tags_te`               | <code>[dict](#dict)\[[str](#str), [ndarray](#numpy.ndarray)]</code> | Tags for the temporal hierarchies.                                                           | *required*                         |
| `sep`                   | <code>[str](#str)</code>                                            | Separator for the cross-temporal tags. Default is "//".                                      | <code>'//'</code>                  |
| `id_col`                | <code>[str](#str)</code>                                            | Column that identifies each series. Default is 'unique\_id'.                                 | <code>'unique\_id'</code>          |
| `id_time_col`           | <code>[str](#str)</code>                                            | Column that identifies each (aggregated) timestep. Default is 'temporal\_id'.                | <code>'temporal\_id'</code>        |
| `cross_temporal_id_col` | <code>[str](#str)</code>                                            | Column that will identify each cross-temporal aggregation. Default is 'cross\_temporal\_id'. | <code>'cross\_temporal\_id'</code> |

**Returns:**

| Type                                                                                                                     | Description                                                                                                                                    |
| ------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| <code>[tuple](#tuple)\[[FrameT](#narwhals.typing.FrameT), [dict](#dict)\[[str](#str), [ndarray](#numpy.ndarray)]]</code> | tuple\[FrameT, dict\[str, np.ndarray]]: df, tags\_ct df: DataFrame with cross-temporal ids. tags\_ct: Tags for the cross-temporal hierarchies. |

## Hierarchical Visualization

### `HierarchicalPlot`

```python theme={null}
HierarchicalPlot(S, tags, S_id_col='unique_id')
```

Hierarchical Plot

This class contains a collection of matplotlib visualization methods, suited for small
to medium sized hierarchical series.

**Parameters:**

| Name       | Type                                                                | Description                                                                                                       | Default                   |
| ---------- | ------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | ------------------------- |
| `S`        | <code>[Frame](#narwhals.typing.Frame)</code>                        | DataFrame with summing matrix of size `(base, bottom)`, see [aggregate function](./utils.html#aggregate).         | *required*                |
| `tags`     | <code>[dict](#dict)\[[str](#str), [ndarray](#numpy.ndarray)]</code> | hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level. | *required*                |
| `S_id_col` | <code>[str](#str)</code>                                            | column that identifies each aggregation. Default is 'unique\_id'.                                                 | <code>'unique\_id'</code> |

#### `HierarchicalPlot.plot_summing_matrix`

```python theme={null}
plot_summing_matrix()
```

Summation Constraints plot

This method simply plots the hierarchical aggregation
constraints matrix $\mathbf{S}$.

**Returns:**

| Name  | Type                                             | Description                                              |
| ----- | ------------------------------------------------ | -------------------------------------------------------- |
| `fig` | <code>[Figure](#matplotlib.figure.Figure)</code> | figure object containing the plot of the summing matrix. |

#### `HierarchicalPlot.plot_series`

```python theme={null}
plot_series(series, Y_df, models=None, level=None, id_col='unique_id', time_col='ds', target_col='y')
```

Single Series plot

**Parameters:**

| Name         | Type                                                             | Description                                                                                                                          | Default                   |
| ------------ | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ------------------------- |
| `series`     | <code>[str](#str)</code>                                         | string identifying the `'unique_id'` any-level series to plot.                                                                       | *required*                |
| `Y_df`       | <code>[Frame](#narwhals.typing.Frame)</code>                     | hierarchically structured series ($\mathbf{y}_{[a,b]}$).     It contains columns `['unique_id', 'ds', 'y']`, it may have `'models'`. | *required*                |
| `models`     | <code>[Optional](#Optional)\[[list](#list)\[[str](#str)]]</code> | string identifying filtering model columns. Default is None.                                                                         | <code>None</code>         |
| `level`      | <code>[Optional](#Optional)\[[list](#list)\[[int](#int)]]</code> | confidence levels for prediction intervals available in `Y_df`. Default is None.                                                     | <code>None</code>         |
| `id_col`     | <code>[str](#str)</code>                                         | column that identifies each series. Default is 'unique\_id'.                                                                         | <code>'unique\_id'</code> |
| `time_col`   | <code>[str](#str)</code>                                         | column that identifies each timestep, its values can be timestamps or integers. Default is 'ds'.                                     | <code>'ds'</code>         |
| `target_col` | <code>[str](#str)</code>                                         | column that contains the target. Default is 'y'.                                                                                     | <code>'y'</code>          |

**Returns:**

| Name  | Type                                             | Description                                             |
| ----- | ------------------------------------------------ | ------------------------------------------------------- |
| `fig` | <code>[Figure](#matplotlib.figure.Figure)</code> | figure object containing the plot of the single series. |

#### `HierarchicalPlot.plot_hierarchically_linked_series`

```python theme={null}
plot_hierarchically_linked_series(bottom_series, Y_df, models=None, level=None, id_col='unique_id', time_col='ds', target_col='y')
```

Hierarchically Linked Series plot

**Parameters:**

| Name            | Type                                                             | Description                                                                                                             | Default                   |
| --------------- | ---------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ------------------------- |
| `bottom_series` | <code>[str](#str)</code>                                         | string identifying the `'unique_id'` bottom-level series to plot.                                                       | *required*                |
| `Y_df`          | <code>[Frame](#narwhals.typing.Frame)</code>                     | hierarchically structured series ($\mathbf{y}_{[a,b]}$).     It contains columns \['unique\_id', 'ds', 'y'] and models. | *required*                |
| `models`        | <code>[Optional](#Optional)\[[list](#list)\[[str](#str)]]</code> | string identifying filtering model columns. Default is None.                                                            | <code>None</code>         |
| `level`         | <code>[Optional](#Optional)\[[list](#list)\[[int](#int)]]</code> | confidence levels for prediction intervals available in `Y_df`. Default is None.                                        | <code>None</code>         |
| `id_col`        | <code>[str](#str)</code>                                         | column that identifies each series. Default is 'unique\_id'.                                                            | <code>'unique\_id'</code> |
| `time_col`      | <code>[str](#str)</code>                                         | column that identifies each timestep, its values can be timestamps or integers. Default is 'ds'.                        | <code>'ds'</code>         |
| `target_col`    | <code>[str](#str)</code>                                         | column that contains the target. Default is 'y'.                                                                        | <code>'y'</code>          |

**Returns:**

| Name  | Type                                             | Description                                                             |
| ----- | ------------------------------------------------ | ----------------------------------------------------------------------- |
| `fig` | <code>[Figure](#matplotlib.figure.Figure)</code> | figure object containing the plots of the hierarchically linked series. |

#### `HierarchicalPlot.plot_hierarchical_predictions_gap`

```python theme={null}
plot_hierarchical_predictions_gap(Y_df, models=None, xlabel=None, ylabel=None, id_col='unique_id', time_col='ds', target_col='y')
```

Hierarchically Predictions Gap plot

**Parameters:**

| Name         | Type                                                             | Description                                                                                                             | Default                   |
| ------------ | ---------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ------------------------- |
| `Y_df`       | <code>[Frame](#narwhals.typing.Frame)</code>                     | hierarchically structured series ($\mathbf{y}_{[a,b]}$).     It contains columns \['unique\_id', 'ds', 'y'] and models. | *required*                |
| `models`     | <code>[Optional](#Optional)\[[list](#list)\[[str](#str)]]</code> | string identifying filtering model columns. Default is None.                                                            | <code>None</code>         |
| `xlabel`     | <code>[Optional](#Optional)\[[str](#str)]</code>                 | string for the plot's x axis label. Default is None.                                                                    | <code>None</code>         |
| `ylabel`     | <code>[Optional](#Optional)\[[str](#str)]</code>                 | string for the plot's y axis label. Default is None.                                                                    | <code>None</code>         |
| `id_col`     | <code>[str](#str)</code>                                         | column that identifies each series. Default is 'unique\_id'.                                                            | <code>'unique\_id'</code> |
| `time_col`   | <code>[str](#str)</code>                                         | column that identifies each timestep, its values can be timestamps or integers. Default is 'ds'.                        | <code>'ds'</code>         |
| `target_col` | <code>[str](#str)</code>                                         | column that contains the target. Default is 'y'.                                                                        | <code>'y'</code>          |

**Returns:**

| Name  | Type                                             | Description                                                                                                        |
| ----- | ------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------ |
| `fig` | <code>[Figure](#matplotlib.figure.Figure)</code> | figure object containing the plot of the aggregated predictions at different levels of the hierarchical structure. |

### Example

```python theme={null}
from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS
from datasetsforecast.hierarchical import HierarchicalData

Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
S = S.reset_index(names="unique_id")

Y_test_df  = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)

fcst = StatsForecast(
    models=[AutoETS(season_length=12, model='AAZ')],
    freq='MS',
    n_jobs=-1
)
Y_hat_df = fcst.forecast(df=Y_train_df, h=24).reset_index()

# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)

hplots.plot_hierarchical_predictions_gap(
    Y_df=Y_hat_df, models='AutoETS',
    xlabel='Month', ylabel='Predictions',
)
```
