Aggregation/Visualization Utils
The HierarchicalForecast
package contains utility functions to wrangle
and visualize hierarchical series datasets. The
aggregate
function of the module allows you to create a hierarchy from categorical
variables representing the structure levels, returning also the
aggregation contraints matrix .
In addition, HierarchicalForecast
ensures compatibility of its
reconciliation methods with other popular machine-learning libraries via
its external forecast adapters that transform output base forecasts from
external libraries into a compatible data frame format.
Aggregate Function
source
aggregate
aggregate (df:pandas.core.frame.DataFrame, spec:List[List[str]], is_balanced:bool=False, sparse_s:bool=False)
Utils Aggregation Function. Aggregates bottom level series contained in
the pandas DataFrame df
according to levels defined in the spec
list.
Type | Default | Details | |
---|---|---|---|
df | DataFrame | Dataframe with columns ['ds', 'y'] and columns to aggregate. | |
spec | List | List of levels. Each element of the list should contain a list of columns of df to aggregate. | |
is_balanced | bool | False | Deprecated. |
sparse_s | bool | False | Return S_df as a sparse dataframe. |
Returns | pandas DataFrame | Hierarchically structured series. |
Hierarchical Visualization
source
HierarchicalPlot
HierarchicalPlot (S:pandas.core.frame.DataFrame, tags:Dict[str,numpy.ndarray])
*Hierarchical Plot
This class contains a collection of matplotlib visualization methods, suited for small to medium sized hierarchical series.
Parameters:
S
: pd.DataFrame with summing matrix of size
(base, bottom)
, see aggregate
function.
tags
: np.ndarray, with hierarchical aggregation indexes, where each
key is a level and its value contains tags associated to that
level.
*
source
plot_summing_matrix
plot_summing_matrix ()
*Summation Constraints plot
This method simply plots the hierarchical aggregation constraints matrix .*
source
plot_series
plot_series (series:str, Y_df:Optional[pandas.core.frame.DataFrame]=None, models:Optional[List[str]]=None, level:Optional[List[int]]=None)
*Single Series plot
Parameters:
series
: str, string identifying the 'unique_id'
any-level series to plot.
Y_df
: pd.DataFrame, hierarchically
structured series (). It contains columns
['unique_id', 'ds', 'y']
, it may have 'models'
.
models
:
List[str], string identifying filtering model columns. level
: float
list 0-100, confidence levels for prediction intervals available in
Y_df
.
Returns:
Single series plot with filtered models and prediction
interval level.
*
source
plot_hierarchically_linked_series
plot_hierarchically_linked_series (bottom_series:str, Y_df:Optional[pandas.core.frame.DataFr ame]=None, models:Optional[List[str]]=None, level:Optional[List[int]]=None)
*Hierarchically Linked Series plot
Parameters:
bottom_series
: str, string identifying the
'unique_id'
bottom-level series to plot.
Y_df
: pd.DataFrame,
hierarchically structured series (). It contains
columns [‘unique_id’, ‘ds’, ‘y’] and models.
models
:
List[str], string identifying filtering model columns. level
: float
list 0-100, confidence levels for prediction intervals available in
Y_df
.
Returns:
Collection of hierarchilly linked series plots
associated with the bottom_series
and filtered models and prediction
interval level.
*
source
plot_hierarchical_predictions_gap
plot_hierarchical_predictions_gap (Y_df:pandas.core.frame.DataFrame, models:Optional[List[str]]=None, xlabel:Optional=None, ylabel:Optional=None)
*Hierarchically Predictions Gap plot
Parameters:
Y_df
: pd.DataFrame, hierarchically structured
series (). It contains columns [‘unique_id’, ‘ds’,
‘y’] and models.
models
: List[str], string identifying
filtering model columns. xlabel
: str, string for the plot’s x axis
label. ylable
: str, string for the plot’s y axis label.
Returns:
Plots of aggregated predictions at different levels of
the hierarchical structure. The aggregation is performed according to
the tag levels see aggregate
function.
*
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, ETS, Naive
from datasetsforecast.hierarchical import HierarchicalData
Y_df, S, tags = HierarchicalData.load('./data', 'Labour')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_test_df = Y_df.groupby('unique_id').tail(24)
Y_train_df = Y_df.drop(Y_test_df.index)
Y_test_df = Y_test_df.set_index('unique_id')
Y_train_df = Y_train_df.set_index('unique_id')
fcst = StatsForecast(
df=Y_train_df,
#models=[AutoARIMA(season_length=12), Naive()],
models=[ETS(season_length=12, model='AAZ')],
freq='MS',
n_jobs=-1
)
Y_hat_df = fcst.forecast(h=24)
# Plot prediction difference of different aggregation
# Levels Country, Country/Region, Country/Gender/Region ...
hplots = HierarchicalPlot(S=S, tags=tags)
hplots.plot_hierarchical_predictions_gap(
Y_df=Y_hat_df, models='ETS',
xlabel='Month', ylabel='Predictions',
)
External Forecast Adapters
source
samples_to_quantiles_df
samples_to_quantiles_df (samples:numpy.ndarray, unique_ids:Iterable[str], dates:Iterable, quantiles:Optional[Iterable[float]]=None, level:Optional[Iterable[int]]=None, model_name:Optional[str]='model')
*Transform Random Samples into HierarchicalForecast input. Auxiliary
function to create compatible HierarchicalForecast input Y_hat_df
dataframe.
Parameters:
samples
: numpy array. Samples from forecast
distribution of shape [n_series, n_samples, horizon].
unique_ids
: string list. Unique identifiers for each time series.
dates
: datetime list. List of forecast dates.
quantiles
: float
list in [0., 1.]. Alternative to level, quantiles to estimate from y
distribution.
level
: int list in [0,100]. Probability levels for
prediction intervals.
model_name
: string. Name of forecasting
model.
Returns:
quantiles
: float list in [0., 1.]. quantiles to
estimate from y distribution .
Y_hat_df
: pd.DataFrame. With base
quantile forecasts with columns ds and models to reconcile indexed by
unique_id.*