Probabilistic forecasting is a natural answer to quantify the uncertainty of target variable’s future. The task requires to model the following conditional predictive distribution:

P(yt+1:t+H    y:t)\mathbb{P}(\mathbf{y}_{t+1:t+H} \;|\; \mathbf{y}_{:t})

We will show you how to tackle the task with NeuralForecast by combining a classic Long Short Term Memory Network (LSTM) and the Neural Hierarchical Interpolation (NHITS) with the multi quantile loss function (MQLoss).

MQLoss(yτ,[y^τ(q1),y^τ(q2),,y^τ(Q)])=1HqQL(yτ,y^τ(q))\mathrm{MQLoss}(y_{\tau}, [\hat{y}^{(q1)}_{\tau},\hat{y}^{(q2)}_{\tau},\dots,\hat{y}^{(Q)}_{\tau}]) = \frac{1}{H} \sum_{q} \mathrm{QL}(y_{\tau}, \hat{y}^{(q)}_{\tau})

In this notebook we will:
1. Install NeuralForecast Library
2. Explore the M4-Hourly data.
3. Train the LSTM and NHITS
4. Visualize the LSTM/NHITS prediction intervals.

You can run these experiments using GPU with Google Colab.

1. Installing NeuralForecast

!pip install neuralforecast

Useful functions

The plot_grid auxiliary function defined below will be useful to plot different time series, and different models’ forecasts.

import random
import warnings
warnings.filterwarnings("ignore")
from itertools import product
import matplotlib.pyplot as plt

def plot_grid(df_train, df_test=None, plot_random=True, model=None, level=None):
    fig, axes = plt.subplots(4, 2, figsize = (24, 14))

    unique_ids = df_train['unique_id'].unique()

    assert len(unique_ids) >= 8, "Must provide at least 8 ts"
    
    if plot_random:
        unique_ids = random.sample(list(unique_ids), k=8)
    else:
        unique_uids = unique_ids[:8]

    for uid, (idx, idy) in zip(unique_ids, product(range(4), range(2))):
        train_uid = df_train.query('unique_id == @uid')
        axes[idx, idy].plot(train_uid['ds'], train_uid['y'], label = 'y_train')
        if df_test is not None:
            max_ds = train_uid['ds'].max()
            test_uid = df_test.query('unique_id == @uid')
            for col in ['y', f'{model}-median', 'y_test']:
                if col in test_uid:
                    axes[idx, idy].plot(test_uid['ds'], test_uid[col], label=col)
            if level is not None:
                for l, alpha in zip(sorted(level), [0.5, .4, .35, .2]):
                    axes[idx, idy].fill_between(
                        test_uid['ds'], 
                        test_uid[f'{model}-lo-{l}'], 
                        test_uid[f'{model}-hi-{l}'],
                        alpha=alpha,
                        color='orange',
                        label=f'{model}_level_{l}',
                    )
        axes[idx, idy].set_title(f'M4 Hourly: {uid}')
        axes[idx, idy].set_xlabel('Timestamp [t]')
        axes[idx, idy].set_ylabel('Target')
        axes[idx, idy].legend(loc='upper left')
        axes[idx, idy].xaxis.set_major_locator(plt.MaxNLocator(20))
        axes[idx, idy].grid()
    fig.subplots_adjust(hspace=0.5)
    plt.show()

2. Loading M4 Data

For testing purposes, we will use the Hourly dataset from the M4 competition.

!wget https://auto-arima-results.s3.amazonaws.com/M4-Hourly.csv
!wget https://auto-arima-results.s3.amazonaws.com/M4-Hourly-test.csv
import pandas as pd
Y_train_df = pd.read_csv('M4-Hourly.csv')
Y_test_df = pd.read_csv('M4-Hourly-test.csv').rename(columns={'y': 'y_test'})

In this example we will use a subset of the data to avoid waiting too long. You can modify the number of series if you want.

n_series = 8
uids = Y_train_df['unique_id'].unique()[:n_series]
Y_train_df = Y_train_df.query('unique_id in @uids')
Y_test_df = Y_test_df.query('unique_id in @uids')
plot_grid(Y_train_df, Y_test_df)

3. Model Training

The core.NeuralForecast provides a high-level interface with our collection of PyTorch models. NeuralForecast is instantiated with a list of models=[LSTM(...), NHITS(...)], configured for the forecasting task.

  • The horizon parameter controls the number of steps ahead of the predictions, in this example 48 hours ahead (2 days).
  • The MQLoss with levels=[80,90] specializes the network’s output into the 80% and 90% prediction intervals.
  • The max_steps=2000, controls the duration of the network’s training.

For more network’s instantiation details check their documentation.

from neuralforecast import NeuralForecast
from neuralforecast.losses.pytorch import MQLoss
from neuralforecast.models import LSTM, NHITS
horizon = 48
levels = [80, 90]
models = [LSTM(input_size=3*horizon, h=horizon,
               loss=MQLoss(level=levels), max_steps=1000),
          NHITS(input_size=7*horizon, h=horizon,
                n_freq_downsample=[24, 12, 1],
                loss=MQLoss(level=levels), max_steps=2000),]
nf = NeuralForecast(models=models, freq='H')
Global seed set to 1
Global seed set to 1

All the models of the library are global, meaning that all time series in Y_train_df is used during a shared optimization to train a single model with shared parameters. This is the most common practice in the forecasting literature for deep learning models, and it is known as “cross-learning”.

nf.fit(df=Y_train_df)
Y_hat_df = nf.predict()
Y_hat_df = Y_hat_df.reset_index()
Y_hat_df.head()
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00,  2.82it/s]
Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 136.22it/s]
unique_iddsLSTM-medianLSTM-lo-90LSTM-lo-80LSTM-hi-80LSTM-hi-90NHITS-medianNHITS-lo-90NHITS-lo-80NHITS-hi-80NHITS-hi-90
0H1701603.491211526.534119544.686646650.893799673.805603611.634888575.999146582.778687677.277039674.705872
1H1702548.415710438.868591472.805237608.017822639.063293569.997803513.014282518.707153598.849609616.793457
2H1703502.010681382.608643411.710419570.315308608.669250510.787628454.184448465.425232538.964172554.563354
3H1704460.870483339.368988370.636719544.232666579.824402478.482330429.657104452.395508500.892090502.507141
4H1705436.451843313.868744343.514191520.812988559.734741463.763611432.906342427.853577486.854492487.539062
Y_test_df = Y_test_df.merge(Y_hat_df, how='left', on=['unique_id', 'ds'])

4. Plotting Predictions

Here we finalize our analysis by plotting the prediction intervals and verifying that both the LSTM and NHITS are giving excellent results.

Consider the output [NHITS-lo-90.0, NHITS-hi-90.0], that represents the 80% prediction interval of the NHITS network; its lower limit gives the 5th percentile (or 0.05 quantile) while its upper limit gives the 95th percentile (or 0.95 quantile). For well-trained models we expect that the target values lie within the interval 90% of the time.

LSTM

plot_grid(Y_train_df, Y_test_df, level=levels, model='LSTM')

NHITS

plot_grid(Y_train_df, Y_test_df, level=levels, model='NHITS')

References