module hierarchicalforecast.utils

Global Variables

  • NUMBA_NOGIL
  • NUMBA_CACHE
  • NUMBA_PARALLEL
  • NUMBA_FASTMATH

function is_strictly_hierarchical

is_strictly_hierarchical(S: ndarray, tags: dict[str, ndarray]) → bool

function aggregate

aggregate(
    df: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    spec: list[list[str]],
    exog_vars: Optional[dict[str, Union[str, list[str]]]] = None,
    sparse_s: bool = False,
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    id_time_col: Optional[str] = None,
    target_cols: Sequence[str] = ('y',)
) → tuple[~FrameT, ~FrameT, dict]
Utils Aggregation Function. Aggregates bottom level series contained in the DataFrame df according to levels defined in the spec list. Args:
  • df (Frame): Dataframe with columns [time_col, *target_cols], columns to aggregate and optionally exog_vars.
  • spec (list[list[str]]): list of levels. Each element of the list should contain a list of columns of df to aggregate.
  • exog_vars (Optional[dict[str, Union[str, list[str]]]], optional): dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None.
  • sparse_s (bool, optional): Return S_df as a sparse Pandas dataframe. Default is False.
  • id_col (str, optional): Column that will identify each serie after aggregation. Default is “unique_id”.
  • time_col (str, optional): Column that identifies each timestep, its values can be timestamps or integers. Default is “ds”.
  • id_time_col (Optional[str], optional): Column that will identify each timestep after temporal aggregation. If provided, aggregate will operate temporally. Default is None.
  • target_cols (Sequence[str], optional): list of columns that contains the targets to aggregate. Default is (“y”,).
Returns:
  • tuple[FrameT, FrameT, dict]: Y_df, S_df, tags
  • Y_df: Hierarchically structured series.
  • S_df: Summing dataframe.
  • tags: Aggregation indices.

function aggregate_temporal

aggregate_temporal(
    df: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    spec: dict[str, int],
    exog_vars: Optional[dict[str, Union[str, list[str]]]] = None,
    sparse_s: bool = False,
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    id_time_col: str = 'temporal_id',
    target_cols: Sequence[str] = ('y',),
    aggregation_type: str = 'local'
) → tuple[~FrameT, ~FrameT, dict]
Utils Aggregation Function for Temporal aggregations. Aggregates bottom level timesteps contained in the DataFrame df according to temporal levels defined in the spec list. Args:
  • df (Frame): Dataframe with columns [time_col, target_cols] and columns to aggregate.
  • spec (dict[str, int]): Dictionary of temporal levels. Each key should be a string with the value representing the number of bottom-level timesteps contained in the aggregation.
  • exog_vars (Optional[dict[str, Union[str, list[str]]]], optional): dictionary of string keys & values that can either be a list of strings or a single string keys correspond to column names and the values represent the aggregation(s) that will be applied to each column. Accepted values are those from Pandas or Polars aggregation Functions, check the respective docs for guidance. Default is None.
  • sparse_s (bool, optional): Return S_df as a sparse Pandas dataframe. Default is False.
  • id_col (str, optional): Column that will identify each serie after aggregation. Default is ‘unique_id’.
  • time_col (str, optional): Column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.
  • id_time_col (str, optional): Column that will identify each timestep after aggregation. Default is ‘temporal_id’.
  • target_cols (Sequence[str], optional): List of columns that contain the targets to aggregate. Default is (‘y’,).
  • aggregation_type (str, optional): If ‘local’ the aggregation will be performed on the timestamps of each timeseries independently. If ‘global’ the aggregation will be performed on the unique timestamps of all timeseries. Default is ‘local’.
Returns:
  • tuple[FrameT, FrameT, dict]: Y_df, S_df, tags
  • Y_df: Temporally hierarchically structured series.
  • S_df: Temporal summing dataframe.
  • tags: Temporal aggregation indices.

function make_future_dataframe

make_future_dataframe(
    df: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    freq: Union[str, int],
    h: int,
    id_col: str = 'unique_id',
    time_col: str = 'ds'
) → ~FrameT
Create future dataframe for forecasting. Args:
  • df (Frame): Dataframe with ids, times and values for the exogenous regressors.
  • freq (Union[str, int]): Frequency of the data. Must be a valid pandas or polars offset alias, or an integer.
  • h (int): Forecast horizon.
  • id_col (str, optional): Column that identifies each serie. Default is ‘unique_id’.
  • time_col (str, optional): Column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.
Returns:
  • FrameT: DataFrame with future values.

function get_cross_temporal_tags

get_cross_temporal_tags(
    df: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    tags_cs: dict[str, ndarray],
    tags_te: dict[str, ndarray],
    sep: str = '//',
    id_col: str = 'unique_id',
    id_time_col: str = 'temporal_id',
    cross_temporal_id_col: str = 'cross_temporal_id'
) → tuple[~FrameT, dict[str, ndarray]]
Get cross-temporal tags. Args:
  • df (Frame): DataFrame with temporal ids.
  • tags_cs (dict[str, np.ndarray]): Tags for the cross-sectional hierarchies.
  • tags_te (dict[str, np.ndarray]): Tags for the temporal hierarchies.
  • sep (str, optional): Separator for the cross-temporal tags. Default is ”//”.
  • id_col (str, optional): Column that identifies each serie. Default is ‘unique_id’.
  • id_time_col (str, optional): Column that identifies each (aggregated) timestep. Default is ‘temporal_id’.
  • cross_temporal_id_col (str, optional): Column that will identify each cross-temporal aggregation. Default is ‘cross_temporal_id’.
Returns:
  • tuple[FrameT, dict[str, np.ndarray]]: df, tags_ct
  • df: DataFrame with cross-temporal ids.
  • tags_ct: Tags for the cross-temporal hierarchies.

function level_to_outputs

level_to_outputs(level: list[int]) → tuple[list[float], list[str]]
Converts list of levels into output names matching StatsForecast and NeuralForecast methods. Args:
  • level (list[int]): Probability levels for prediction intervals [0,100].
Returns:
  • tuple[list[float], list[str]]: quantiles and output_names
  • quantiles: quantiles derived from levels.
  • output_names: String list with output column names.

function quantiles_to_outputs

quantiles_to_outputs(quantiles: list[float]) → tuple[list[float], list[str]]
Converts list of quantiles into output names matching StatsForecast and NeuralForecast methods. Args:
  • quantiles (list[float]): Alternative to level, quantiles to estimate from y distribution [0., 1.].
Returns:
  • tuple[list[float], list[str]]: quantiles and output_names
  • quantiles: quantiles to estimate from y distribution.
  • output_names: String list with output column names.

function samples_to_quantiles_df

samples_to_quantiles_df(
    samples: ndarray,
    unique_ids: Sequence[str],
    dates: list[str],
    quantiles: Optional[list[float]] = None,
    level: Optional[list[int]] = None,
    model_name: str = 'model',
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    backend: str = 'pandas'
) → tuple[list[float], ~FrameT]
Transform Random Samples into HierarchicalForecast input. Auxiliary function to create compatible HierarchicalForecast input Y_hat_df dataframe. Args:
  • samples (np.ndarray): Samples from forecast distribution of shape [n_series, n_samples, horizon].
  • unique_ids (Sequence[str]): Unique identifiers for each time series.
  • dates (list[str]): list of forecast dates.
  • quantiles (Optional[list[float]], optional): Alternative to level, quantiles to estimate from y distribution [0., 1.]. Default is None.
  • level (Optional[list[int]], optional): Probability levels for prediction intervals [0,100]. Default is None.
  • model_name (str, optional): Name of forecasting model. Default is “model”.
  • id_col (str, optional): column that identifies each serie. Default is ‘unique_id’.
  • time_col (str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.
  • backend (str, optional): backend to use for the output dataframe, either ‘pandas’ or ‘polars’. Default is ‘pandas’.
Returns:
  • tuple[list[float], FrameT]: quantiles and Y_hat_df
  • quantiles: quantiles to estimate from y distribution [0., 1.].
  • Y_hat_df: DataFrame with base quantile forecasts with columns ds and models to reconcile indexed by unique_id.

class CodeTimer

method __init__

__init__(name=None, verbose=True)

class HierarchicalPlot

Hierarchical Plot This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series. Args:
  • S (Frame): DataFrame with summing matrix of size (base, bottom), see aggregate function.
  • tags (dict[str, np.ndarray]): hierarchical aggregation indexes, where each key is a level and its value contains tags associated to that level.
  • S_id_col (str, optional): column that identifies each aggregation. Default is ‘unique_id’.

method __init__

__init__(
    S: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    tags: dict[str, ndarray],
    S_id_col: str = 'unique_id'
)

method plot_hierarchical_predictions_gap

plot_hierarchical_predictions_gap(
    Y_df: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    models: Optional[list[str]] = None,
    xlabel: Optional[str] = None,
    ylabel: Optional[str] = None,
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    target_col: str = 'y'
)
Hierarchically Predictions Gap plot Args:
  • Y_df (Frame): hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
  • models (Optional[list[str]], optional): string identifying filtering model columns. Default is None.
  • xlabel (Optional[str], optional): string for the plot’s x axis label. Default is None.
  • ylabel (Optional[str], optional): string for the plot’s y axis label. Default is None.
  • id_col (str, optional): column that identifies each serie. Default is ‘unique_id’.
  • time_col (str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.
  • target_col (str, optional): column that contains the target. Default is ‘y’.
Returns:
  • matplotlib.figure.Figure: figure object containing the plot of the aggregated predictions at different levels of the hierarchical structure.

method plot_hierarchically_linked_series

plot_hierarchically_linked_series(
    bottom_series: str,
    Y_df: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    models: Optional[list[str]] = None,
    level: Optional[list[int]] = None,
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    target_col: str = 'y'
)
Hierarchically Linked Series plot Args:
  • bottom_series (str): string identifying the 'unique_id' bottom-level series to plot.
  • Y_df (Frame): hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns [‘unique_id’, ‘ds’, ‘y’] and models.
  • models (Optional[list[str]], optional): string identifying filtering model columns. Default is None.
  • level (Optional[list[int]], optional): confidence levels for prediction intervals available in Y_df. Default is None.
  • id_col (str, optional): column that identifies each serie. Default is ‘unique_id’.
  • time_col (str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.
  • target_col (str, optional): column that contains the target. Default is ‘y’.
Returns:
  • matplotlib.figure.Figure: figure object containing the plots of the hierarchilly linked series.

method plot_series

plot_series(
    series: str,
    Y_df: Union[ForwardRef('DataFrame[Any]'), ForwardRef('LazyFrame[Any]')],
    models: Optional[list[str]] = None,
    level: Optional[list[int]] = None,
    id_col: str = 'unique_id',
    time_col: str = 'ds',
    target_col: str = 'y'
)
Single Series plot Args:
  • series (str): string identifying the 'unique_id' any-level series to plot.
  • Y_df (Frame): hierarchically structured series (y[a,b]\mathbf{y}_{[a,b]}). It contains columns ['unique_id', 'ds', 'y'], it may have 'models'.
  • models (Optional[list[str]], optional): string identifying filtering model columns. Default is None.
  • level (Optional[list[int]], optional): confidence levels for prediction intervals available in Y_df. Default is None.
  • id_col (str, optional): column that identifies each serie. Default is ‘unique_id’.
  • time_col (str, optional): column that identifies each timestep, its values can be timestamps or integers. Default is ‘ds’.
  • target_col (str, optional): column that contains the target. Default is ‘y’.
Returns:
  • matplotlib.figure.Figure: figure object containing the plot of the single series.

method plot_summing_matrix

plot_summing_matrix()
Summation Constraints plot This method simply plots the hierarchical aggregation constraints matrix S\mathbf{S}. Returns:
  • matplotlib.figure.Figure: figure object containing the plot of the summing matrix.