Aggregation/Visualization Utils
The HierarchicalForecast
package contains utility functions to wrangle
and visualize hierarchical series datasets. The
aggregate
function of the module allows you to create a hierarchy from categorical
variables representing the structure levels, returning also the
aggregation contraints matrix .
In addition, HierarchicalForecast
ensures compatibility of its
reconciliation methods with other popular machine-learning libraries via
its external forecast adapters that transform output base forecasts from
external libraries into a compatible data frame format.
Aggregate Function
source
aggregate
Utils Aggregation Function. Aggregates bottom level series contained in
the DataFrame df
according to levels defined in the spec
list.
Type | Default | Details | |
---|---|---|---|
df | Union | Dataframe with columns [time_col, *target_cols] , columns to aggregate and optionally exog_vars. | |
spec | list | list of levels. Each element of the list should contain a list of columns of df to aggregate. | |
exog_vars | Optional | None | |
sparse_s | bool | False | Return S_df as a sparse Pandas dataframe. |
id_col | str | unique_id | Column that will identify each serie after aggregation. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
target_cols | list | [‘y’] | list of columns that contains the targets to aggregate. |
Returns | tuple | Hierarchically structured series. |
Hierarchical Visualization
source
HierarchicalPlot
*Hierarchical Plot
This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series.
Parameters:
S
: DataFrame with summing matrix of size
(base, bottom)
, see aggregate
function.
tags
: np.ndarray, with hierarchical aggregation indexes, where each
key is a level and its value contains tags associated to that level.
S_id_col
: str=‘unique_id’, column that identifies each
aggregation.
*
source
plot_summing_matrix
*Summation Constraints plot
This method simply plots the hierarchical aggregation constraints matrix .*
source
plot_series
*Single Series plot
Parameters:
series
: str, string identifying the 'unique_id'
any-level series to plot.
Y_df
: DataFrame, hierarchically
structured series (). It contains columns
['unique_id', 'ds', 'y']
, it may have 'models'
.
models
:
list[str], string identifying filtering model columns.
level
:
float list 0-100, confidence levels for prediction intervals available
in Y_df
.
id_col
: str=‘unique_id’, column that identifies each
serie.
time_col
: str=‘ds’, column that identifies each timestep,
its values can be timestamps or integers.
target_col
: str=‘y’,
column that contains the target.
Returns:
Single series plot with filtered models and prediction
interval level.
*
source
plot_hierarchically_linked_series
*Hierarchically Linked Series plot
Parameters:
bottom_series
: str, string identifying the
'unique_id'
bottom-level series to plot.
Y_df
: DataFrame,
hierarchically structured series (). It contains
columns [‘unique_id’, ‘ds’, ‘y’] and models.
models
:
list[str], string identifying filtering model columns.
level
:
float list 0-100, confidence levels for prediction intervals available
in Y_df
.
id_col
: str=‘unique_id’, column that identifies each
serie.
time_col
: str=‘ds’, column that identifies each timestep,
its values can be timestamps or integers.
target_col
: str=‘y’,
column that contains the target.
Returns:
Collection of hierarchilly linked series plots
associated with the bottom_series
and filtered models and prediction
interval level.
*
source
plot_hierarchical_predictions_gap
*Hierarchically Predictions Gap plot
Parameters:
Y_df
: DataFrame, hierarchically structured series
(). It contains columns [‘unique_id’, ‘ds’, ‘y’]
and models.
models
: list[str], string identifying filtering
model columns.
xlabel
: str, string for the plot’s x axis
label.
ylabel
: str, string for the plot’s y axis label.
id_col
: str=‘unique_id’, column that identifies each serie.
time_col
: str=‘ds’, column that identifies each timestep, its values
can be timestamps or integers.
target_col
: str=‘y’, column that
contains the target.
Returns:
Plots of aggregated predictions at different levels of
the hierarchical structure. The aggregation is performed according to
the tag levels see aggregate
function.
*
External Forecast Adapters
source
samples_to_quantiles_df
*Transform Random Samples into HierarchicalForecast input. Auxiliary
function to create compatible HierarchicalForecast input Y_hat_df
dataframe.
Parameters:
samples
: numpy array. Samples from forecast
distribution of shape [n_series, n_samples, horizon].
unique_ids
: string list. Unique identifiers for each time series.
dates
: datetime list. list of forecast dates.
quantiles
: float
list in [0., 1.]. Alternative to level, quantiles to estimate from y
distribution.
level
: int list in [0,100]. Probability levels for
prediction intervals.
model_name
: string. Name of forecasting
model.
id_col
: str=‘unique_id’, column that identifies each
serie.
time_col
: str=‘ds’, column that identifies each timestep,
its values can be timestamps or integers.
backend
: str=‘pandas’,
backend to use for the output dataframe, either ‘pandas’ or
‘polars’.
Returns:
quantiles
: float list in [0., 1.]. quantiles to
estimate from y distribution .
Y_hat_df
: DataFrame. With base
quantile forecasts with columns ds and models to reconcile indexed by
unique_id.*