Documentation Index
Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
Use this file to discover all available pages before exploring further.
Vanilla Transformer, following implementation of the Informer paper,
used as baseline.
The architecture has three distinctive features:
- Full-attention
mechanism with O(L^2) time and memory complexity.
- Classic
encoder-decoder proposed by Vaswani et al. (2017) with a multi-head
attention mechanism.
- An MLP multi-step decoder that predicts long
time-series sequences in a single forward operation rather than
step-by-step.
The Vanilla Transformer model utilizes a three-component approach to
define its embedding:
- It employs encoded autoregressive features
obtained from a convolution network.
- It uses window-relative
positional embeddings derived from harmonic functions.
- Absolute
positional embeddings obtained from calendar features are utilized.
References
Figure 1. Transformer
Architecture.
Usage Example
import pandas as pd
import matplotlib.pyplot as plt
from neuralforecast import NeuralForecast
from neuralforecast.models import VanillaTransformer
from neuralforecast.utils import AirPassengersPanel, AirPassengersStatic
Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]] # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test
model = VanillaTransformer(h=12,
input_size=24,
hidden_size=16,
conv_hidden_size=32,
n_head=2,
loss=MAE(),
scaler_type='robust',
learning_rate=1e-3,
max_steps=500,
val_check_steps=50,
early_stop_patience_steps=2)
nf = NeuralForecast(
models=[model],
freq='ME'
)
nf.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
forecasts = nf.predict(futr_df=Y_test_df)
Y_hat_df = forecasts.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])
if model.loss.is_distribution_output:
plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['VanillaTransformer-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:],
y1=plot_df['VanillaTransformer-lo-90'][-12:].values,
y2=plot_df['VanillaTransformer-hi-90'][-12:].values,
alpha=0.4, label='level 90')
plt.grid()
plt.legend()
plt.plot()
else:
plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['VanillaTransformer'], c='blue', label='Forecast')
plt.legend()
plt.grid()