hierarchicalforecast.utils
is_strictly_hierarchical
aggregate
df
according to levels defined in the spec
list.
Args:
df
(Frame): Dataframe with columns [time_col, *target_cols]
, columns to aggregate and optionally exog_vars.spec
(list[list[str]]): list of levels. Each element of the list should contain a list of columns of df
to aggregate.exog_vars
(Optional[dict[str, Union[str, list[str]]]], optional): dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None.sparse_s
(bool, optional): Return S_df
as a sparse Pandas dataframe. Default is False.id_col
(str, optional): Column that will identify each serie after aggregation. Default is “unique_id”.time_col
(str, optional): Column that identifies each timestep, its values can be timestamps or integers. Default is “ds”.id_time_col
(Optional[str], optional): Column that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally. Default is None.target_cols
(Sequence[str], optional): list of columns that contains the targets to aggregate. Default is (“y”,).tuple[FrameT, FrameT, dict]
: Y_df, S_df, tagsY_df
: Hierarchically structured series.S_df
: Summing dataframe.tags
: Aggregation indices.aggregate_temporal
df
according to temporal levels defined in the spec
list.
Args:
df
(Frame): Dataframe with columns [time_col, target_cols]
and columns to aggregate.spec
(dict[str, int]): Dictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation.exog_vars
(Optional[dict[str, Union[str, list[str]]]], optional): dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None.sparse_s
(bool, optional): Return S_df
as a sparse Pandas dataframe. Default is False.id_col
(str, optional): Column that will identify each serie after aggregation. Default is ‘unique_id’.time_col
(str, optional): Column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.id_time_col
(str, optional): Column that will identify each timestep after aggregation. Default is ‘temporal_id’.target_cols
(Sequence[str], optional): List of columns that contain the targets to aggregate. Default is (‘y’,).aggregation_type
(str, optional): If ‘local’ the aggregation will be performed on the timestamps of each timeseries independently. If ‘global’ the aggregation will be performed on the unique timestamps of all timeseries. Default is ‘local’.tuple[FrameT, FrameT, dict]
: Y_df, S_df, tagsY_df
: Temporally hierarchically structured series.S_df
: Temporal summing dataframe.tags
: Temporal aggregation indices.make_future_dataframe
df
(Frame): Dataframe with ids, times and values for the exogenous regressors.freq
(Union[str, int]): Frequency of the data. Must be a valid pandas or polars offset alias, or an integer.h
(int): Forecast horizon.id_col
(str, optional): Column that identifies each serie. Default is ‘unique_id’.time_col
(str, optional): Column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.FrameT
: DataFrame with future values.get_cross_temporal_tags
df
(Frame): DataFrame with temporal ids.tags_cs
(dict[str, np.ndarray]): Tags for the cross-sectional hierarchies.tags_te
(dict[str, np.ndarray]): Tags for the temporal hierarchies.sep
(str, optional): Separator for the cross-temporal tags. Default is ”//”.id_col
(str, optional): Column that identifies each serie. Default is ‘unique_id’.id_time_col
(str, optional): Column that identifies each (aggregated) timestep. Default is ‘temporal_id’.cross_temporal_id_col
(str, optional): Column that will identify each cross-temporal aggregation. Default is ‘cross_temporal_id’.tuple[FrameT, dict[str, np.ndarray]]
: df, tags_ctdf
: DataFrame with cross-temporal ids.tags_ct
: Tags for the cross-temporal hierarchies.level_to_outputs
level
(list[int]): Probability levels for prediction intervals [0,100].tuple[list[float], list[str]]
: quantiles and output_namesquantiles
: quantiles derived from levels.output_names
: String list with output column names.quantiles_to_outputs
quantiles
(list[float]): Alternative to level, quantiles to estimate from y distribution [0., 1.].tuple[list[float], list[str]]
: quantiles and output_namesquantiles
: quantiles to estimate from y distribution.output_names
: String list with output column names.samples_to_quantiles_df
Y_hat_df
dataframe.
Args:
samples
(np.ndarray): Samples from forecast distribution of shape [n_series, n_samples, horizon].unique_ids
(Sequence[str]): Unique identifiers for each time series.dates
(list[str]): list of forecast dates.quantiles
(Optional[list[float]], optional): Alternative to level, quantiles to estimate from y distribution [0., 1.]. Default is None.level
(Optional[list[int]], optional): Probability levels for prediction intervals [0,100]. Default is None.model_name
(str, optional): Name of forecasting model. Default is “model”.id_col
(str, optional): column that identifies each serie. Default is ‘unique_id’.time_col
(str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.backend
(str, optional): backend to use for the output dataframe, either ‘pandas’ or ‘polars’. Default is ‘pandas’.tuple[list[float], FrameT]
: quantiles and Y_hat_dfquantiles
: quantiles to estimate from y distribution [0., 1.].Y_hat_df
: DataFrame with base quantile forecasts with columns ds and models to reconcile indexed by unique_id.CodeTimer
__init__
HierarchicalPlot
S
(Frame): DataFrame with summing matrix of size (base, bottom)
, see aggregate function.tags
(dict[str, np.ndarray]): hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.S_id_col
(str, optional): column that identifies each aggregation. Default is ‘unique_id’.__init__
plot_hierarchical_predictions_gap
Y_df
(Frame): hierarchically structured series (). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.models
(Optional[list[str]], optional): string identifying filtering model columns. Default is None.xlabel
(Optional[str], optional): string for the plot’s x axis label. Default is None.ylabel
(Optional[str], optional): string for the plot’s y axis label. Default is None.id_col
(str, optional): column that identifies each serie. Default is ‘unique_id’.time_col
(str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.target_col
(str, optional): column that contains the target. Default is ‘y’.matplotlib.figure.Figure
: figure object containing the plot of the aggregated predictions at different levels of the hierarchical structure.plot_hierarchically_linked_series
bottom_series
(str): string identifying the 'unique_id'
bottom-level series to plot.Y_df
(Frame): hierarchically structured series (). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.models
(Optional[list[str]], optional): string identifying filtering model columns. Default is None.level
(Optional[list[int]], optional): confidence levels for prediction intervals available in Y_df
. Default is None.id_col
(str, optional): column that identifies each serie. Default is ‘unique_id’.time_col
(str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.target_col
(str, optional): column that contains the target. Default is ‘y’.matplotlib.figure.Figure
: figure object containing the plots of the hierarchilly linked series.plot_series
series
(str): string identifying the 'unique_id'
any-level series to plot.Y_df
(Frame): hierarchically structured series (). It contains columns ['unique_id', 'ds', 'y']
, it may have 'models'
.models
(Optional[list[str]], optional): string identifying filtering model columns. Default is None.level
(Optional[list[int]], optional): confidence levels for prediction intervals available in Y_df
. Default is None.id_col
(str, optional): column that identifies each serie. Default is ‘unique_id’.time_col
(str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.target_col
(str, optional): column that contains the target. Default is ‘y’.matplotlib.figure.Figure
: figure object containing the plot of the single series.plot_summing_matrix
matplotlib.figure.Figure
: figure object containing the plot of the summing matrix.