Understanding and debugging hierarchical forecast reconciliationAfter reconciling hierarchical forecasts, practitioners often need to answer questions like:
- How incoherent were my base forecasts? Did they significantly violate the hierarchical constraints?
- How much did reconciliation change the forecasts? Which levels were adjusted the most?
- Did reconciliation introduce problems? Such as negative values where they shouldn’t exist?
- Are the reconciled forecasts numerically coherent? Within acceptable tolerance?
HierarchicalReconciliation class provides an optional
diagnostics=True parameter that generates a comprehensive report
answering these questions. This notebook demonstrates the diagnostics
feature through three practical use cases.
You can run these experiments using CPU or GPU with Google Colab.
Setup
Load Data
We’ll use the TourismSmall dataset which has a 4-level hierarchy: - Country (1 node) - Country/Purpose (4 nodes) - Country/Purpose/State (28 nodes) - Country/Purpose/State/CityNonCity (56 nodes - bottom level)Generate Base Forecasts
| unique_id | ds | AutoARIMA | Naive | |
|---|---|---|---|---|
| 0 | bus | 2006-03-31 | 8918.478516 | 11547.0 |
| 1 | bus | 2006-06-30 | 9581.925781 | 11547.0 |
| 2 | bus | 2006-09-30 | 11194.676758 | 11547.0 |
| 3 | bus | 2006-12-31 | 10678.958008 | 11547.0 |
| 4 | hol | 2006-03-31 | 42805.347656 | 26418.0 |
Use Case 1: Verifying Reconciliation Quality
Scenario: You’ve just run reconciliation and want to verify that it worked correctly - that base forecasts were indeed incoherent and reconciliation fixed them. The diagnostics report answers: - Were the base forecasts incoherent? (coherence residuals before > 0) - Are the reconciled forecasts coherent? (coherence residuals after ≈ 0) - Is numerical coherence satisfied within tolerance?| level | metric | AutoARIMA/BottomUp | Naive/BottomUp | AutoARIMA/MinTrace_method-ols | Naive/MinTrace_method-ols | |
|---|---|---|---|---|---|---|
| 48 | Overall | coherence_residual_mae_before | 91.123692 | 0.0 | 91.123692 | 0.0 |
| 50 | Overall | coherence_residual_mae_after | 0.000000 | 0.0 | 0.000000 | 0.0 |
| 60 | Overall | is_coherent | 1.000000 | 1.0 | 1.000000 | 1.0 |
| 61 | Overall | coherence_max_violation | 0.000000 | 0.0 | 0.000000 | 0.0 |
coherence_residual_mae_before > 0: Base
forecasts violated hierarchical constraints -
coherence_residual_mae_after ≈ 0: Reconciliation fixed the
incoherence - is_coherent = 1.0: Reconciled forecasts satisfy
constraints within tolerance - coherence_max_violation: Maximum
deviation from perfect coherence (should be tiny)
| AutoARIMA/BottomUp | Naive/BottomUp | AutoARIMA/MinTrace_method-ols | Naive/MinTrace_method-ols | |||||
|---|---|---|---|---|---|---|---|---|
| metric | coherence_residual_mae_after | coherence_residual_mae_before | coherence_residual_mae_after | coherence_residual_mae_before | coherence_residual_mae_after | coherence_residual_mae_before | coherence_residual_mae_after | coherence_residual_mae_before |
| level | ||||||||
| Country | 0.0 | 1551.154858 | 0.0 | 0.0 | 0.0 | 1551.154858 | 0.0 | 0.0 |
| Country/Purpose | 0.0 | 996.859118 | 0.0 | 0.0 | 0.0 | 996.859118 | 0.0 | 0.0 |
| Country/Purpose/State | 0.0 | 91.836329 | 0.0 | 0.0 | 0.0 | 91.836329 | 0.0 | 0.0 |
| Country/Purpose/State/CityNonCity | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
| Overall | 0.0 | 91.123692 | 0.0 | 0.0 | 0.0 | 91.123692 | 0.0 | 0.0 |
Use Case 2: Comparing Reconciliation Methods
Scenario: You want to understand how different reconciliation methods affect your forecasts differently. Which method makes smaller adjustments? Which levels are most impacted? The diagnostics report helps compare: - Adjustment magnitude (MAE, RMSE, max) across methods - Which hierarchy levels each method adjusts the most| level | metric | AutoARIMA/BottomUp | Naive/BottomUp | AutoARIMA/TopDown_method-forecast_proportions | Naive/TopDown_method-forecast_proportions | AutoARIMA/MinTrace_method-ols | Naive/MinTrace_method-ols | AutoARIMA/MinTrace_method-wls_struct | Naive/MinTrace_method-wls_struct | |
|---|---|---|---|---|---|---|---|---|---|---|
| 52 | Overall | adjustment_mae | 91.123692 | 0.0 | 152.381830 | 0.0 | 125.796357 | 7.790422e-13 | 92.567005 | 3.649316e-13 |
| 53 | Overall | adjustment_rmse | 361.699708 | 0.0 | 327.852747 | 0.0 | 235.618628 | 1.956331e-12 | 297.653444 | 7.211469e-13 |
| 54 | Overall | adjustment_max | 3563.736473 | 0.0 | 2354.425237 | 0.0 | 1367.921921 | 1.455192e-11 | 2621.788616 | 3.637979e-12 |
| BottomUp | TopDown_method-forecast_proportions | MinTrace_method-ols | MinTrace_method-wls_struct | |
|---|---|---|---|---|
| level | ||||
| Country | 1551.154858 | 0.000000 | 924.028186 | 1953.754301 |
| Country/Purpose | 996.859118 | 1106.796143 | 875.789096 | 666.870396 |
| Country/Purpose/State | 91.836329 | 151.248239 | 114.460983 | 61.695544 |
| Country/Purpose/State/CityNonCity | 0.000000 | 87.497279 | 63.638995 | 33.745576 |
| Overall | 91.123692 | 152.381830 | 125.796357 | 92.567005 |
Use Case 3: Detecting Negative Value Issues
Scenario: Your forecasts represent quantities that cannot be negative (e.g., sales, visitors). You need to check if reconciliation introduced negative values. The diagnostics report tracks: -negative_count_before/after: Count of
negative values before and after reconciliation - negative_introduced:
Negatives created by reconciliation - negative_removed: Negatives
fixed by reconciliation
| level | metric | AutoARIMA/BottomUp | Naive/BottomUp | AutoARIMA/MinTrace_method-ols | Naive/MinTrace_method-ols | AutoARIMA/MinTrace_method-ols_nonnegative-True | Naive/MinTrace_method-ols_nonnegative-True | |
|---|---|---|---|---|---|---|---|---|
| 56 | Overall | negative_count_before | 33.0 | 36.0 | 33.0 | 36.0 | 33.0 | 36.0 |
| 57 | Overall | negative_count_after | 55.0 | 60.0 | 3.0 | 4.0 | 0.0 | 0.0 |
| 58 | Overall | negative_introduced | 22.0 | 24.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 59 | Overall | negative_removed | 0.0 | 0.0 | 30.0 | 32.0 | 33.0 | 36.0 |
negative_count_before: Negatives in base
forecasts - negative_count_after: Negatives after reconciliation -
negative_introduced: New negatives created by reconciliation (bad!) -
negative_removed: Negatives fixed by reconciliation (good!)
Notice how MinTrace with nonnegative=True eliminates all negative
values.
| AutoARIMA/BottomUp | Naive/BottomUp | AutoARIMA/MinTrace_method-ols | Naive/MinTrace_method-ols | AutoARIMA/MinTrace_method-ols_nonnegative-True | Naive/MinTrace_method-ols_nonnegative-True | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| metric | negative_count_after | negative_count_before | negative_count_after | negative_count_before | negative_count_after | negative_count_before | negative_count_after | negative_count_before | negative_count_after | negative_count_before | negative_count_after | negative_count_before |
| level | ||||||||||||
| Country | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Country/Purpose | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Country/Purpose/State | 7.0 | 0.0 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| Country/Purpose/State/CityNonCity | 15.0 | 15.0 | 16.0 | 16.0 | 0.0 | 15.0 | 0.0 | 16.0 | 0.0 | 15.0 | 0.0 | 16.0 |
| Overall | 22.0 | 15.0 | 24.0 | 16.0 | 0.0 | 15.0 | 0.0 | 16.0 | 0.0 | 15.0 | 0.0 | 16.0 |
Exporting Diagnostics
The diagnostics DataFrame can be exported to CSV for CI pipelines, benchmarks, or sharing with stakeholders.| level | metric | AutoARIMA/BottomUp | Naive/BottomUp | AutoARIMA/MinTrace_method-ols | Naive/MinTrace_method-ols | |
|---|---|---|---|---|---|---|
| 48 | Overall | coherence_residual_mae_before | 91.123692 | 0.0 | 91.123692 | 0.000000e+00 |
| 49 | Overall | coherence_residual_rmse_before | 361.699708 | 0.0 | 361.699708 | 0.000000e+00 |
| 50 | Overall | coherence_residual_mae_after | 0.000000 | 0.0 | 0.000000 | 0.000000e+00 |
| 51 | Overall | coherence_residual_rmse_after | 0.000000 | 0.0 | 0.000000 | 0.000000e+00 |
| 52 | Overall | adjustment_mae | 91.123692 | 0.0 | 125.796357 | 7.790422e-13 |
| 53 | Overall | adjustment_rmse | 361.699708 | 0.0 | 235.618628 | 1.956331e-12 |
| 54 | Overall | adjustment_max | 3563.736473 | 0.0 | 1367.921921 | 1.455192e-11 |
| 55 | Overall | adjustment_mean | 29.283713 | 0.0 | 46.279825 | -5.114311e-13 |
| 56 | Overall | negative_count_before | 0.000000 | 0.0 | 0.000000 | 0.000000e+00 |
| 57 | Overall | negative_count_after | 0.000000 | 0.0 | 2.000000 | 0.000000e+00 |
| 58 | Overall | negative_introduced | 0.000000 | 0.0 | 2.000000 | 0.000000e+00 |
| 59 | Overall | negative_removed | 0.000000 | 0.0 | 0.000000 | 0.000000e+00 |
| 60 | Overall | is_coherent | 1.000000 | 1.0 | 1.000000 | 1.000000e+00 |
| 61 | Overall | coherence_max_violation | 0.000000 | 0.0 | 0.000000 | 0.000000e+00 |
Summary of Diagnostic Metrics
| Metric | Description | Interpretation |
|---|---|---|
coherence_residual_mae_before | Mean absolute incoherence before reconciliation | Higher = more incoherent base forecasts |
coherence_residual_mae_after | Mean absolute incoherence after reconciliation | Should be ~0 |
coherence_residual_rmse_before/after | RMSE variant of above | More sensitive to large violations |
adjustment_mae | Mean absolute change made by reconciliation | Higher = more forecast modification |
adjustment_rmse | RMSE of adjustments | More sensitive to large changes |
adjustment_max | Maximum absolute adjustment | Identifies extreme changes |
adjustment_mean | Mean adjustment (signed) | Shows directional bias |
negative_count_before | Count of negatives in base forecasts | - |
negative_count_after | Count of negatives after reconciliation | Should be 0 for non-negative data |
negative_introduced | Negatives created by reconciliation | Warning sign if > 0 |
negative_removed | Negatives fixed by reconciliation | Good if > 0 |
is_coherent | Whether forecasts satisfy constraints (Overall only) | 1.0 = coherent |
coherence_max_violation | Maximum coherence violation (Overall only) | Should be < tolerance |
References
- Hyndman, R.J., & Athanasopoulos, G. (2021). “Forecasting: principles and practice, 3rd edition: Chapter 11: Forecasting hierarchical and grouped series.”
- Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization.

