HierarchicalForecast package contains utility functions to wrangle
and visualize hierarchical series datasets. The
aggregate
function of the module allows you to create a hierarchy from categorical
variables representing the structure levels, returning also the
aggregation contraints matrix .
In addition, HierarchicalForecast ensures compatibility of its
reconciliation methods with other popular machine-learning libraries via
its external forecast adapters that transform output base forecasts from
external libraries into a compatible data frame format.
Aggregate Function
aggregate
df according
to levels defined in the spec list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | Frame | Dataframe with columns [time_col, *target_cols], columns to aggregate and optionally exog_vars. | required |
spec | list[list[str]] | list of levels. Each element of the list should contain a list of columns of df to aggregate. | required |
exog_vars | Optional[dict[str, Union[str, list[str]]]] | dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None. | None |
sparse_s | bool | Return S_df as a sparse Pandas dataframe. Default is False. | False |
id_col | str | Column that will identify each serie after aggregation. Default is “unique_id”. | ‘unique_id’ |
time_col | str | Column that identifies each timestep, its values can be timestamps or integers. Default is “ds”. | ‘ds’ |
id_time_col | Optional[str] | Column that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally. Default is None. | None |
target_cols | Sequence[str] | list of columns that contains the targets to aggregate. Default is (“y”,). | (‘y’,) |
aggregate_temporal
df according
to temporal levels defined in the spec list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | Frame | Dataframe with columns [time_col, target_cols] and columns to aggregate. | required |
spec | dict[str, int] | Dictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation. | required |
exog_vars | Optional[dict[str, Union[str, list[str]]]] | dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None. | None |
sparse_s | bool | Return S_df as a sparse Pandas dataframe. Default is False. | False |
id_col | str | Column that will identify each serie after aggregation. Default is ‘unique_id’. | ‘unique_id’ |
time_col | str | Column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’. | ‘ds’ |
id_time_col | str | Column that will identify each timestep after aggregation. Default is ‘temporal_id’. | ‘temporal_id’ |
target_cols | Sequence[str] | List of columns that contain the targets to aggregate. Default is (‘y’,). | (‘y’,) |
aggregation_type | str | If ‘local’ the aggregation will be performed on the timestamps of each timeseries independently. If ‘global’ the aggregation will be performed on the unique timestamps of all timeseries. Default is ‘local’. | ‘local’ |
make_future_dataframe
| Name | Type | Description | Default |
|---|---|---|---|
df | Frame | Dataframe with ids, times and values for the exogenous regressors. | required |
freq | Union[str, int] | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | required |
h | int | Forecast horizon. | required |
id_col | str | Column that identifies each serie. Default is ‘unique_id’. | ‘unique_id’ |
time_col | str | Column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’. | ‘ds’ |
| Name | Type | Description |
|---|---|---|
FrameT | FrameT | DataFrame with future values. |
get_cross_temporal_tags
| Name | Type | Description | Default |
|---|---|---|---|
df | Frame | DataFrame with temporal ids. | required |
tags_cs | dict[str, ndarray] | Tags for the cross-sectional hierarchies. | required |
tags_te | dict[str, ndarray] | Tags for the temporal hierarchies. | required |
sep | str | Separator for the cross-temporal tags. Default is ”//”. | ’//‘ |
id_col | str | Column that identifies each series. Default is ‘unique_id’. | ‘unique_id’ |
id_time_col | str | Column that identifies each (aggregated) timestep. Default is ‘temporal_id’. | ‘temporal_id’ |
cross_temporal_id_col | str | Column that will identify each cross-temporal aggregation. Default is ‘cross_temporal_id’. | ‘cross_temporal_id’ |
Hierarchical Visualization
HierarchicalPlot
| Name | Type | Description | Default |
|---|---|---|---|
S | Frame | DataFrame with summing matrix of size (base, bottom), see aggregate function. | required |
tags | dict[str, ndarray] | hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level. | required |
S_id_col | str | column that identifies each aggregation. Default is ‘unique_id’. | ‘unique_id’ |
HierarchicalPlot.plot_summing_matrix
| Name | Type | Description |
|---|---|---|
fig | Figure | figure object containing the plot of the summing matrix. |
HierarchicalPlot.plot_series
| Name | Type | Description | Default |
|---|---|---|---|
series | str | string identifying the 'unique_id' any-level series to plot. | required |
Y_df | Frame | hierarchically structured series (). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'. | required |
models | Optional[list[str]] | string identifying filtering model columns. Default is None. | None |
level | Optional[list[int]] | confidence levels for prediction intervals available in Y_df. Default is None. | None |
id_col | str | column that identifies each series. Default is ‘unique_id’. | ‘unique_id’ |
time_col | str | column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’. | ‘ds’ |
target_col | str | column that contains the target. Default is ‘y’. | ‘y’ |
| Name | Type | Description |
|---|---|---|
fig | Figure | figure object containing the plot of the single series. |
HierarchicalPlot.plot_hierarchically_linked_series
| Name | Type | Description | Default |
|---|---|---|---|
bottom_series | str | string identifying the 'unique_id' bottom-level series to plot. | required |
Y_df | Frame | hierarchically structured series (). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models. | required |
models | Optional[list[str]] | string identifying filtering model columns. Default is None. | None |
level | Optional[list[int]] | confidence levels for prediction intervals available in Y_df. Default is None. | None |
id_col | str | column that identifies each series. Default is ‘unique_id’. | ‘unique_id’ |
time_col | str | column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’. | ‘ds’ |
target_col | str | column that contains the target. Default is ‘y’. | ‘y’ |
| Name | Type | Description |
|---|---|---|
fig | Figure | figure object containing the plots of the hierarchically linked series. |
HierarchicalPlot.plot_hierarchical_predictions_gap
| Name | Type | Description | Default |
|---|---|---|---|
Y_df | Frame | hierarchically structured series (). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models. | required |
models | Optional[list[str]] | string identifying filtering model columns. Default is None. | None |
xlabel | Optional[str] | string for the plot’s x axis label. Default is None. | None |
ylabel | Optional[str] | string for the plot’s y axis label. Default is None. | None |
id_col | str | column that identifies each series. Default is ‘unique_id’. | ‘unique_id’ |
time_col | str | column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’. | ‘ds’ |
target_col | str | column that contains the target. Default is ‘y’. | ‘y’ |
| Name | Type | Description |
|---|---|---|
fig | Figure | figure object containing the plot of the aggregated predictions at different levels of the hierarchical structure. |

