HierarchicalForecast
package contains utility functions to wrangle
and visualize hierarchical series datasets. The
aggregate
function of the module allows you to create a hierarchy from categorical
variables representing the structure levels, returning also the
aggregation contraints matrix .
In addition, HierarchicalForecast
ensures compatibility of its
reconciliation methods with other popular machine-learning libraries via
its external forecast adapters that transform output base forecasts from
external libraries into a compatible data frame format.
Utils Aggregation Function. Aggregates bottom level series contained in the DataFrame
df
according to levels defined in the spec
list.
Type | Default | Details | |
---|---|---|---|
df | Union | Dataframe with columns [time_col, *target_cols] , columns to aggregate and optionally exog_vars. | |
spec | list | list of levels. Each element of the list should contain a list of columns of df to aggregate. | |
exog_vars | Optional | None | |
sparse_s | bool | False | Return S_df as a sparse Pandas dataframe. |
id_col | str | unique_id | Column that will identify each serie after aggregation. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
id_time_col | Optional | None | Column that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally. |
target_cols | Sequence | (‘y’,) | list of columns that contains the targets to aggregate. |
Returns | tuple | Hierarchically structured series. |
Utils Aggregation Function for Temporal aggregations. Aggregates bottom level timesteps contained in the DataFrame
df
according to temporal
levels defined in the spec
list.
Type | Default | Details | |
---|---|---|---|
df | Union | Dataframe with columns [time_col, target_cols] and columns to aggregate. | |
spec | dict | Dictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation. | |
exog_vars | Optional | None | |
sparse_s | bool | False | Return S_df as a sparse Pandas dataframe. |
id_col | str | unique_id | Column that will identify each serie after aggregation. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
id_time_col | str | temporal_id | Column that will identify each timestep after aggregation. |
target_cols | Sequence | (‘y’,) | List of columns that contain the targets to aggregate. |
aggregation_type | str | local | If ‘local’ the aggregation will be performed on the timestamps of each timeseries independently. If ‘global’ the aggregation will be performed on the unique timestamps of all timeseries. |
Returns | tuple | Temporally hierarchically structured series. |
Create future dataframe for forecasting.
Type | Default | Details | |
---|---|---|---|
df | Union | Dataframe with ids, times and values for the exogenous regressors. | |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
h | int | Forecast horizon. | |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | FrameT | DataFrame with future values |
Get cross-temporal tags.
Type | Default | Details | |
---|---|---|---|
df | Union | DataFrame with temporal ids. | |
tags_cs | dict | Tags for the cross-sectional hierarchies | |
tags_te | dict | Tags for the temporal hierarchies | |
sep | str | // | Separator for the cross-temporal tags. |
id_col | str | unique_id | Column that identifies each serie. |
id_time_col | str | temporal_id | Column that identifies each (aggregated) timestep. |
cross_temporal_id_col | str | cross_temporal_id | Column that will identify each cross-temporal aggregation. |
Returns | tuple | DataFrame with cross-temporal ids. |
*Hierarchical Plot This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series. Parameters:
S
: DataFrame with summing matrix of size
(base, bottom)
, see aggregate
function.tags
: np.ndarray, with hierarchical aggregation indexes, where each
key is a level and its value contains tags associated to that level.S_id_col
: str=‘unique_id’, column that identifies each
aggregation.*Summation Constraints plot This method simply plots the hierarchical aggregation constraints matrix . Returns:
fig
: matplotlib.figure.Figure, figure object
containing the plot of the summing matrix.*
*Single Series plot Parameters:
series
: str, string identifying the 'unique_id'
any-level series to plot.Y_df
: DataFrame, hierarchically
structured series (). It contains columns
['unique_id', 'ds', 'y']
, it may have 'models'
.models
:
list[str], string identifying filtering model columns.level
:
float list 0-100, confidence levels for prediction intervals available
in Y_df
.id_col
: str=‘unique_id’, column that identifies each
serie.time_col
: str=‘ds’, column that identifies each timestep,
its values can be timestamps or integers.target_col
: str=‘y’,
column that contains the target.fig
: matplotlib.figure.Figure, figure object
containing the plot of the single series.*
*Hierarchically Linked Series plot Parameters:
bottom_series
: str, string identifying the
'unique_id'
bottom-level series to plot.Y_df
: DataFrame,
hierarchically structured series (). It contains
columns [‘unique_id’, ‘ds’, ‘y’] and models. models
:
list[str], string identifying filtering model columns.level
:
float list 0-100, confidence levels for prediction intervals available
in Y_df
.id_col
: str=‘unique_id’, column that identifies each
serie.time_col
: str=‘ds’, column that identifies each timestep,
its values can be timestamps or integers.target_col
: str=‘y’,
column that contains the target.fig
: matplotlib.figure.Figure, figure object
containing the plots of the hierarchilly linked series.*
*Hierarchically Predictions Gap plot Parameters:
Y_df
: DataFrame, hierarchically structured series
(). It contains columns [‘unique_id’, ‘ds’, ‘y’]
and models. models
: list[str], string identifying filtering
model columns. xlabel
: str, string for the plot’s x axis
label.ylabel
: str, string for the plot’s y axis label.id_col
: str=‘unique_id’, column that identifies each serie.time_col
: str=‘ds’, column that identifies each timestep, its values
can be timestamps or integers.target_col
: str=‘y’, column that
contains the target.fig
: matplotlib.figure.Figure, figure object
containing the plot of the aggregated predictions at different levels of
the hierarchical structure.*
*Transform Random Samples into HierarchicalForecast input. Auxiliary function to create compatible HierarchicalForecast input
Y_hat_df
dataframe.
Parameters:samples
: numpy array. Samples from forecast
distribution of shape [n_series, n_samples, horizon].unique_ids
: string list. Unique identifiers for each time series.dates
: datetime list. list of forecast dates.quantiles
: float
list in [0., 1.]. Alternative to level, quantiles to estimate from y
distribution.level
: int list in [0,100]. Probability levels for
prediction intervals.model_name
: string. Name of forecasting
model.id_col
: str=‘unique_id’, column that identifies each
serie.time_col
: str=‘ds’, column that identifies each timestep,
its values can be timestamps or integers.backend
: str=‘pandas’,
backend to use for the output dataframe, either ‘pandas’ or
‘polars’.quantiles
: float list in [0., 1.]. quantiles to
estimate from y distribution .Y_hat_df
: DataFrame. With base
quantile forecasts with columns ds and models to reconcile indexed by
unique_id.*