Transformer models, originally proposed for applications in natural language processing, have seen increasing adoption in the field of time series forecasting. The transformative power of these models lies in their novel architecture that relies heavily on the self-attention mechanism, which helps the model to focus on different parts of the input sequence to make predictions, while capturing long-range dependencies within the data. In the context of time series forecasting, Transformer models leverage this self-attention mechanism to identify relevant information across different periods in the time series, making them exceptionally effective in predicting future values for complex and noisy sequences.

Long horizon forecasting consists of predicting a large number of timestamps. It is a challenging task because of the volatility of the predictions and the computational complexity. To solve this problem, recent studies proposed a variety of Transformer-based models.

The Neuralforecast library includes implementations of the following popular recent models: Informer (Zhou, H. et al.Β 2021), Autoformer (Wu et al.Β 2021), FEDformer (Zhou, T. et al.Β 2022), and PatchTST (Nie et al.Β 2023).

Our implementation of all these models are univariate, meaning that only autoregressive values of each feature are used for forecasting. We observed that these unvivariate models are more accurate and faster than their multivariate couterpart.

In this notebook we will show how to: * Load the ETTm2 benchmark dataset, used in the academic literature. * Train models * Forecast the test set

The results achieved in this notebook outperform the original self-reported results in the respective original paper, with a fraction of the computational cost. Additionally, all models are trained with the default recommended parameters, results can be further improved using our auto models with automatic hyperparameter selection.

You can run these experiments using GPU with Google Colab.

Open In Colab

1. Installing libraries

!pip install neuralforecast datasetsforecast

2. Load ETTm2 Data

The LongHorizon class will automatically download the complete ETTm2 dataset and process it.

It return three Dataframes: Y_df contains the values for the target variables, X_df contains exogenous calendar features and S_df contains static features for each time-series (none for ETTm2). For this example we will only use Y_df.

If you want to use your own data just replace Y_df. Be sure to use a long format and have a simmilar structure than our data set.

import pandas as pd

from datasetsforecast.long_horizon import LongHorizon
# Change this to your own data to try the model
Y_df, _, _ = LongHorizon.load(directory='./', group='ETTm2')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

n_time = len(Y_df.ds.unique())
val_size = int(.2 * n_time)
test_size = int(.2 * n_time)

0HUFL2016-07-01 00:00:00-0.041413
1HUFL2016-07-01 00:15:00-0.185467
57600HULL2016-07-01 00:00:000.040104
57601HULL2016-07-01 00:15:00-0.214450
115200LUFL2016-07-01 00:00:000.695804
115201LUFL2016-07-01 00:15:000.434685
172800LULL2016-07-01 00:00:000.434430
172801LULL2016-07-01 00:15:000.428168
230400MUFL2016-07-01 00:00:00-0.599211
230401MUFL2016-07-01 00:15:00-0.658068
288000MULL2016-07-01 00:00:00-0.393536
288001MULL2016-07-01 00:15:00-0.659338
345600OT2016-07-01 00:00:001.018032
345601OT2016-07-01 00:15:000.980124

3. Train models

We will train models using the cross_validation method, which allows users to automatically simulate multiple historic forecasts (in the test set).

The cross_validation method will use the validation set for hyperparameter selection and early stopping, and will then produce the forecasts for the test set.

First, instantiate each model in the models list, specifying the horizon, input_size, and training iterations.

(NOTE: The FEDformer model was excluded due to extremely long training times.)

from neuralforecast.core import NeuralForecast
from neuralforecast.models import Informer, Autoformer, FEDformer, PatchTST
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpopb2vyyt
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpopb2vyyt/
horizon = 96 # 24hrs = 4 * 15 min.
models = [Informer(h=horizon,                 # Forecasting horizon
                input_size=horizon,           # Input size
                max_steps=1000,               # Number of training iterations
                val_check_steps=100,          # Compute validation loss every 100 steps
                early_stop_patience_steps=3), # Stop training if validation loss does not improve
INFO:lightning_fabric.utilities.seed:Global seed set to 1
INFO:lightning_fabric.utilities.seed:Global seed set to 1
INFO:lightning_fabric.utilities.seed:Global seed set to 1


Check our auto models for automatic hyperparameter optimization.

Instantiate a NeuralForecast object with the following required parameters:

Second, use the cross_validation method, specifying the dataset (Y_df), validation size and test size.

nf = NeuralForecast(

Y_hat_df = nf.cross_validation(df=Y_df,

The cross_validation method will return the forecasts for each model on the test set.

0HUFL2017-10-24 00:00:002017-10-23 23:45:00-1.055062-0.861487-0.860189-0.977673
1HUFL2017-10-24 00:15:002017-10-23 23:45:00-1.021247-0.873399-0.865730-0.865620
2HUFL2017-10-24 00:30:002017-10-23 23:45:00-1.057297-0.900345-0.944296-0.961624
3HUFL2017-10-24 00:45:002017-10-23 23:45:00-0.886652-0.867466-0.974849-1.049700
4HUFL2017-10-24 01:00:002017-10-23 23:45:00-1.000431-0.887454-1.008530-0.953600

4. Evaluate Results

Next, we plot the forecasts on the test set for the OT variable for all models.

import matplotlib.pyplot as plt
Y_plot = Y_hat_df[Y_hat_df['unique_id']=='OT'] # OT dataset
cutoffs = Y_hat_df['cutoff'].unique()[::horizon]
Y_plot = Y_plot[Y_hat_df['cutoff'].isin(cutoffs)]

plt.plot(Y_plot['ds'], Y_plot['y'], label='True')
plt.plot(Y_plot['ds'], Y_plot['Informer'], label='Informer')
plt.plot(Y_plot['ds'], Y_plot['Autoformer'], label='Autoformer')
plt.plot(Y_plot['ds'], Y_plot['PatchTST'], label='PatchTST')

Finally, we compute the test errors using the Mean Absolute Error (MAE):

MAE=1Windowsβˆ—Horizonβˆ‘Ο„βˆ£yΟ„βˆ’y^Ο„βˆ£\qquad MAE = \frac{1}{Windows * Horizon} \sum_{\tau} |y_{\tau} - \hat{y}_{\tau}| \qquad

from neuralforecast.losses.numpy import mae
mae_informer = mae(Y_hat_df['y'], Y_hat_df['Informer'])
mae_autoformer = mae(Y_hat_df['y'], Y_hat_df['Autoformer'])
mae_patchtst = mae(Y_hat_df['y'], Y_hat_df['PatchTST'])

print(f'Informer: {mae_informer:.3f}')
print(f'Autoformer: {mae_autoformer:.3f}')
print(f'PatchTST: {mae_patchtst:.3f}')
Informer: 0.339
Autoformer: 0.316
PatchTST: 0.251

For reference, we can check the performance when compared to self-reported performance in their respective papers.


Next steps

We proposed an alternative model for long-horizon forecasting, the NHITS, based on feed-forward networks in (Challu et al.Β 2023). It achieves on par performance with PatchTST, with a fraction of the computational cost. The NHITS tutorial is available here.


Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021, May). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No.Β 12, pp.Β 11106-11115)

Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34, 22419-22430.

Zhou, T., Ma, Z., Wen, Q., Wang, X., Sun, L., & Jin, R. (2022, June). Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning (pp.Β 27268-27286). PMLR.

Nie, Y., Nguyen, N. H., Sinthong, P., & Kalagnanam, J. (2022). A Time Series is Worth 64 Words: Long-term Forecasting with Transformers.

Cristian Challu, Kin G. Olivares, Boris N. Oreshkin, Federico Garza, Max Mergenthaler-Canseco, Artur Dubrawski (2021). NHITS: Neural Hierarchical Interpolation for Time Series Forecasting. Accepted at AAAI 2023.