Aggregation/Visualization Utils
The HierarchicalForecast
package contains utility functions to wrangle
and visualize hierarchical series datasets. The
aggregate
function of the module allows you to create a hierarchy from categorical
variables representing the structure levels, returning also the
aggregation contraints matrix .
In addition, HierarchicalForecast
ensures compatibility of its
reconciliation methods with other popular machine-learning libraries via
its external forecast adapters that transform output base forecasts from
external libraries into a compatible data frame format.
Aggregate Function
source
aggregate
aggregate (df:pandas.core.frame.DataFrame, spec:List[List[str]], is_balanced:bool=False, sparse_s:bool=False)
Utils Aggregation Function. Aggregates bottom level series contained in
the pandas DataFrame df
according to levels defined in the spec
list.
Type | Default | Details | |
---|---|---|---|
df | DataFrame | Dataframe with columns ['ds', 'y'] and columns to aggregate. | |
spec | List | List of levels. Each element of the list should contain a list of columns of df to aggregate. | |
is_balanced | bool | False | Deprecated. |
sparse_s | bool | False | Return S_df as a sparse dataframe. |
Returns | pandas DataFrame | Hierarchically structured series. |
Hierarchical Visualization
source
HierarchicalPlot
HierarchicalPlot (S:pandas.core.frame.DataFrame, tags:Dict[str,numpy.ndarray])
Hierarchical Plot
This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series.
Parameters:
S
: pd.DataFrame with summing matrix of size
(base, bottom)
, see aggregate
function.
tags
: np.ndarray, with hierarchical aggregation indexes, where each
key is a level and its value contains tags associated to that
level.
source
plot_summing_matrix
plot_summing_matrix ()
Summation Constraints plot
This method simply plots the hierarchical aggregation constraints matrix .
source
plot_series
plot_series (series:str, Y_df:Optional[pandas.core.frame.DataFrame]=None, models:Optional[List[str]]=None, level:Optional[List[int]]=None)
Single Series plot
Parameters:
series
: str, string identifying the 'unique_id'
any-level series to plot.
Y_df
: pd.DataFrame, hierarchically
structured series (). It contains columns
['unique_id', 'ds', 'y']
, it may have 'models'
.
models
:
List[str], string identifying filtering model columns. level
: float
list 0-100, confidence levels for prediction intervals available in
Y_df
.
Returns:
Single series plot with filtered models and prediction
interval level.
source
plot_hierarchically_linked_series
plot_hierarchically_linked_series (bottom_series:str, Y_df:Optional[pandas.core.frame.DataFr ame]=None, models:Optional[List[str]]=None, level:Optional[List[int]]=None)
Hierarchically Linked Series plot
Parameters:
bottom_series
: str, string identifying the
'unique_id'
bottom-level series to plot.
Y_df
: pd.DataFrame,
hierarchically structured series (). It contains
columns [‘unique_id’, ‘ds’, ‘y’] and models.
models
:
List[str], string identifying filtering model columns. level
: float
list 0-100, confidence levels for prediction intervals available in
Y_df
.
Returns:
Collection of hierarchilly linked series plots
associated with the bottom_series
and filtered models and prediction
interval level.
source
plot_hierarchical_predictions_gap
plot_hierarchical_predictions_gap (Y_df:pandas.core.frame.DataFrame, models:Optional[List[str]]=None, xlabel:Optional=None, ylabel:Optional=None)
Hierarchically Predictions Gap plot
Parameters:
Y_df
: pd.DataFrame, hierarchically structured
series (). It contains columns [‘unique_id’, ‘ds’,
‘y’] and models.
models
: List[str], string identifying
filtering model columns. xlabel
: str, string for the plot’s x axis
label. ylable
: str, string for the plot’s y axis label.
Returns:
Plots of aggregated predictions at different levels of
the hierarchical structure. The aggregation is performed according to
the tag levels see aggregate
function.
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, ETS, Naive
from datasetsforecast.hierarchical import HierarchicalData
Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_test_df = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')
fcst = StatsForecast(
df=Y_train_df,
#models=[AutoARIMA(season_length=12), Naive()],
models=[ETS(season_length=12, model='AAZ')],
freq='MS',
n_jobs=-1
)
Y_hat_df = fcst.forecast(h=24)
# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)
hplots.plot_hierarchical_predictions_gap(
Y_df=Y_hat_df, models='ETS',
xlabel='Month', ylabel='Predictions',
)
External Forecast Adapters
source
samples_to_quantiles_df
samples_to_quantiles_df (samples:numpy.ndarray, unique_ids:Iterable[str], dates:Iterable, quantiles:Optional[Iterable[float]]=None, level:Optional[Iterable[int]]=None, model_name:Optional[str]='model')
Transform Random Samples into HierarchicalForecast input. Auxiliary
function to create compatible HierarchicalForecast input Y_hat_df
dataframe.
Parameters:
samples
: numpy array. Samples from forecast
distribution of shape [n_series, n_samples, horizon].
unique_ids
: string list. Unique identifiers for each time series.
dates
: datetime list. List of forecast dates.
quantiles
: float
list in [0., 1.]. Alternative to level, quantiles to estimate from y
distribution.
level
: int list in [0,100]. Probability levels for
prediction intervals.
model_name
: string. Name of forecasting
model.
Returns:
quantiles
: float list in [0., 1.]. quantiles to
estimate from y distribution .
Y_hat_df
: pd.DataFrame. With base
quantile forecasts with columns ds and models to reconcile indexed by
unique_id.