Skip to main content
Here we host a collection of datasets used in previous hierarchical research by Rangapuram et al. [2021], Olivares et al. [2023], and Kamarthi et al. [2022]. The benchmark datasets utilized include
  1. Australian Monthly Labour: Labour,
  2. SF Bay Area daily Traffic: Traffic, OldTraffic,
  3. Quarterly Australian Tourism Visits: (TourismSmall),
  4. Monthly Australian Tourism visits: TourismLarge, OldTourismLarge,
  5. daily Wikipedia article views: Wiki2.
Old datasets favor the original datasets with minimal target variable preprocessing (Rangapuram et al. [2021], Olivares et al. [2023]), while the remaining datasets follow PROFHIT experimental settings.

References

Labour

Labour(freq='MS', horizon=8, papers_horizon=12, seasonality=12, test_size=125, tags_names=('Country', 'Country/Region', 'Country/Gender/Region', 'Country/Employment/Gender/Region'))

TourismLarge

TourismLarge(freq='MS', horizon=12, papers_horizon=12, seasonality=12, test_size=57, tags_names=('Country', 'Country/State', 'Country/State/Zone', 'Country/State/Zone/Region', 'Country/Purpose', 'Country/State/Purpose', 'Country/State/Zone/Purpose', 'Country/State/Zone/Region/Purpose'))

TourismSmall

TourismSmall(freq='Q', horizon=4, papers_horizon=4, seasonality=4, test_size=9, tags_names=('Country', 'Country/Purpose', 'Country/Purpose/State', 'Country/Purpose/State/CityNonCity'))

Traffic

Traffic(freq='D', horizon=14, papers_horizon=7, seasonality=7, test_size=91, tags_names=('Level1', 'Level2', 'Level3', 'Level4'))

Wiki2

Wiki2(freq='D', horizon=14, papers_horizon=7, seasonality=7, test_size=91, tags_names=('Views', 'Views/Country', 'Views/Country/Access', 'Views/Country/Access/Agent', 'Views/Country/Access/Agent/Topic'))

OldTraffic

OldTraffic(freq='D', horizon=1, papers_horizon=1, seasonality=7, test_size=91, tags_names=('Level1', 'Level2', 'Level3', 'Level4'))

HierarchicalData

HierarchicalData.download

download(directory)
Download Hierarchical Datasets. Parameters:
NameTypeDescriptionDefault
directorystrDirectory path to download dataset.required

HierarchicalData.load

load(directory, group, cache=True)
Downloads hierarchical forecasting benchmark datasets. Parameters:
NameTypeDescriptionDefault
directorystrDirectory where data will be downloaded.required
groupstrGroup name.required
cacheboolIf True saves and loadsTrue
Returns:
TypeDescription
Tuple[DataFrame, DataFrame]Tuple[pd.DataFrame, pd.DataFrame]: Target time series with columns [‘unique_id’, ‘ds’, ‘y’]. Containes the base time series, Summing matrix of size (hierarchies, bottom).