PatchTST
The PatchTST model is an efficient Transformer-based model for multivariate time series forecasting.
It is based on two key components: - segmentation of time series into windows (patches) which are served as input tokens to Transformer - channel-independence. where each channel contains a single univariate time series.
1. Backbone
Auxiliary Functions
source
get_activation_fn
get_activation_fn (activation)
source
Transpose
Transpose (*dims, contiguous=False)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent
class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Positional Encoding
source
positional_encoding
positional_encoding (pe, learn_pe, q_len, hidden_size)
source
Coord1dPosEncoding
Coord1dPosEncoding (q_len, exponential=False, normalize=True)
source
Coord2dPosEncoding
Coord2dPosEncoding (q_len, hidden_size, exponential=False, normalize=True, eps=0.001)
source
PositionalEncoding
PositionalEncoding (q_len, hidden_size, normalize=True)
RevIN
source
RevIN
RevIN (num_features:int, eps=1e-05, affine=True, subtract_last=False)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent
class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Encoder
source
TSTEncoderLayer
TSTEncoderLayer (q_len, hidden_size, n_heads, d_k=None, d_v=None, linear_hidden_size=256, store_attn=False, norm='BatchNorm', attn_dropout=0, dropout=0.0, bias=True, activation='gelu', res_attention=False, pre_norm=False)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent
class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
source
TSTEncoder
TSTEncoder (q_len, hidden_size, n_heads, d_k=None, d_v=None, linear_hidden_size=None, norm='BatchNorm', attn_dropout=0.0, dropout=0.0, activation='gelu', res_attention=False, n_layers=1, pre_norm=False, store_attn=False)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent
class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
source
TSTiEncoder
TSTiEncoder (c_in, patch_num, patch_len, max_seq_len=1024, n_layers=3, hidden_size=128, n_heads=16, d_k=None, d_v=None, linear_hidden_size=256, norm='BatchNorm', attn_dropout=0.0, dropout=0.0, act='gelu', store_attn=False, key_padding_mask='auto', padding_var=None, attn_mask=None, res_attention=True, pre_norm=False, pe='zeros', learn_pe=True)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent
class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
source
Flatten_Head
Flatten_Head (individual, n_vars, nf, h, c_out, head_dropout=0)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent
class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
source
PatchTST_backbone
PatchTST_backbone (c_in:int, c_out:int, input_size:int, h:int, patch_len:int, stride:int, max_seq_len:Optional[int]=1024, n_layers:int=3, hidden_size=128, n_heads=16, d_k:Optional[int]=None, d_v:Optional[int]=None, linear_hidden_size:int=256, norm:str='BatchNorm', attn_dropout:float=0.0, dropout:float=0.0, act:str='gelu', key_padding_mask:str='auto', padding_var:Optional[int]=None, attn_mask:Optional[torch.Tensor]=None, res_attention:bool=True, pre_norm:bool=False, store_attn:bool=False, pe:str='zeros', learn_pe:bool=True, fc_dropout:float=0.0, head_dropout=0, padding_patch=None, pretrain_head:bool=False, head_type='flatten', individual=False, revin=True, affine=True, subtract_last=False)
*Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their
parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent
class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
2. Model
source
PatchTST
PatchTST (h, input_size, stat_exog_list=None, hist_exog_list=None, futr_exog_list=None, exclude_insample_y=False, encoder_layers:int=3, n_heads:int=16, hidden_size:int=128, linear_hidden_size:int=256, dropout:float=0.2, fc_dropout:float=0.2, head_dropout:float=0.0, attn_dropout:float=0.0, patch_len:int=16, stride:int=8, revin:bool=True, revin_affine:bool=False, revin_subtract_last:bool=True, activation:str='gelu', res_attention:bool=True, batch_normalization:bool=False, learn_pos_embed:bool=True, loss=MAE(), valid_loss=None, max_steps:int=5000, learning_rate:float=0.0001, num_lr_decays:int=-1, early_stop_patience_steps:int=-1, val_check_steps:int=100, batch_size:int=32, valid_batch_size:Optional[int]=None, windows_batch_size=1024, inference_windows_batch_size:int=1024, start_padding_enabled=False, step_size:int=1, scaler_type:str='identity', random_seed:int=1, num_workers_loader:int=0, drop_last_loader:bool=False, optimizer=None, optimizer_kwargs=None, **trainer_kwargs)
*PatchTST
The PatchTST model is an efficient Transformer-based model for multivariate time series forecasting.
It is based on two key components: - segmentation of time series into windows (patches) which are served as input tokens to Transformer - channel-independence, where each channel contains a single univariate time series.
Parameters:
h
: int, Forecast horizon.
input_size
: int,
autorregresive inputs size, y=[1,2,3,4] input_size=2 ->
y_[t-2:t]=[1,2].
stat_exog_list
: str list, static exogenous
columns.
hist_exog_list
: str list, historic exogenous columns.
futr_exog_list
: str list, future exogenous columns.
exclude_insample_y
: bool=False, the model skips the autoregressive
features y[t-input_size:t] if True.
encoder_layers
: int, number
of layers for encoder.
n_heads
: int=16, number of multi-headβs
attention.
hidden_size
: int=128, units of embeddings and
encoders.
linear_hidden_size
: int=256, units of linear layer.
dropout
: float=0.1, dropout rate for residual connection.
fc_dropout
: float=0.1, dropout rate for linear layer.
head_dropout
: float=0.1, dropout rate for Flatten head layer.
attn_dropout
: float=0.1, dropout rate for attention layer.
patch_len
: int=32, length of patch. Note: patch_len = min(patch_len,
input_size + stride).
stride
: int=16, stride of patch.
revin
: bool=True, bool to use RevIn.
revin_affine
: bool=False,
bool to use affine in RevIn.
revin_substract_last
: bool=False,
bool to use substract last in RevIn.
activation
: str=βReLUβ,
activation from [βgeluβ,βreluβ].
res_attention
: bool=False, bool
to use residual attention.
batch_normalization
: bool=False, bool
to use batch normalization.
learn_pos_embedding
: bool=True, bool
to learn positional embedding.
loss
: PyTorch module, instantiated
train loss class from losses
collection.
valid_loss
: PyTorch module=loss
, instantiated valid loss class from
losses
collection.
max_steps
: int=1000, maximum number of training steps.
learning_rate
: float=1e-3, Learning rate between (0, 1).
num_lr_decays
: int=-1, Number of learning rate decays, evenly
distributed across max_steps.
early_stop_patience_steps
: int=-1,
Number of validation iterations before early stopping.
val_check_steps
: int=100, Number of training steps between every
validation loss check.
batch_size
: int=32, number of different
series in each batch.
valid_batch_size
: int=None, number of
different series in each validation and test batch, if None uses
batch_size.
windows_batch_size
: int=1024, number of windows to
sample in each training batch, default uses all.
inference_windows_batch_size
: int=1024, number of windows to sample in
each inference batch.
start_padding_enabled
: bool=False, if True,
the model will pad the time series with zeros at the beginning, by input
size.
step_size
: int=1, step size between each window of temporal
data.
scaler_type
: str=βidentityβ, type of scaler for temporal
inputs normalization see temporal
scalers.
random_seed
: int, random_seed for pytorch initializer and numpy
generators.
num_workers_loader
: int=os.cpu_count(), workers to be
used by TimeSeriesDataLoader
.
drop_last_loader
: bool=False, if
True TimeSeriesDataLoader
drops last non-full batch.
alias
: str,
optional, Custom name of the model.
optimizer
: Subclass of
βtorch.optim.Optimizerβ, optional, user specified optimizer instead of
the default choice (Adam).
optimizer_kwargs
: dict, optional, list
of parameters used by the user specified optimizer
.
**trainer_kwargs
: int, keyword trainer arguments inherited from
PyTorch Lighningβs
trainer.
PatchTST.fit
PatchTST.fit (dataset, val_size=0, test_size=0, random_seed=None, distributed_config=None)
*Fit.
The fit
method, optimizes the neural networkβs weights using the
initialization parameters (learning_rate
, windows_batch_size
, β¦) and
the loss
function as defined during the initialization. Within fit
we use a PyTorch Lightning Trainer
that inherits the initializationβs
self.trainer_kwargs
, to customize its inputs, see PLβs trainer
arguments.
The method is designed to be compatible with SKLearn-like classes and in particular to be compatible with the StatsForecast library.
By default the model
is not saving training checkpoints to protect
disk memory, to get them change enable_checkpointing=True
in
__init__
.
Parameters:
dataset
: NeuralForecastβs
TimeSeriesDataset
,
see
documentation.
val_size
: int, validation size for temporal cross-validation.
random_seed
: int=None, random_seed for pytorch initializer and numpy
generators, overwrites model.__init__βs.
test_size
: int, test
size for temporal cross-validation.
*
PatchTST.predict
PatchTST.predict (dataset, test_size=None, step_size=1, random_seed=None, **data_module_kwargs)
*Predict.
Neural network prediction with PLβs Trainer
execution of
predict_step
.
Parameters:
dataset
: NeuralForecastβs
TimeSeriesDataset
,
see
documentation.
test_size
: int=None, test size for temporal cross-validation.
step_size
: int=1, Step size between each window.
random_seed
:
int=None, random_seed for pytorch initializer and numpy generators,
overwrites model.__init__βs.
**data_module_kwargs
: PLβs
TimeSeriesDataModule args, see
documentation.*
Usage example
import numpy as np
import pandas as pd
import pytorch_lightning as pl
import matplotlib.pyplot as plt
from neuralforecast import NeuralForecast
from neuralforecast.models import PatchTST
from neuralforecast.losses.pytorch import MQLoss, DistributionLoss
from neuralforecast.tsdataset import TimeSeriesDataset
from neuralforecast.utils import AirPassengers, AirPassengersPanel, AirPassengersStatic, augment_calendar_df
AirPassengersPanel, calendar_cols = augment_calendar_df(df=AirPassengersPanel, freq='M')
Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]] # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test
model = PatchTST(h=12,
input_size=104,
patch_len=24,
stride=24,
revin=False,
hidden_size=16,
n_heads=4,
scaler_type='robust',
loss=DistributionLoss(distribution='StudentT', level=[80, 90]),
#loss=MAE(),
learning_rate=1e-3,
max_steps=500,
val_check_steps=50,
early_stop_patience_steps=2)
nf = NeuralForecast(
models=[model],
freq='M'
)
nf.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
forecasts = nf.predict(futr_df=Y_test_df)
Y_hat_df = forecasts.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])
if model.loss.is_distribution_output:
plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['PatchTST-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:],
y1=plot_df['PatchTST-lo-90'][-12:].values,
y2=plot_df['PatchTST-hi-90'][-12:].values,
alpha=0.4, label='level 90')
plt.grid()
plt.legend()
plt.plot()
else:
plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['PatchTST'], c='blue', label='Forecast')
plt.legend()
plt.grid()
Y_hat_df = forecasts.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])
if model.loss.is_distribution_output:
plot_df = plot_df[plot_df.unique_id=='Airline2'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['PatchTST-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:],
y1=plot_df['PatchTST-lo-90'][-12:].values,
y2=plot_df['PatchTST-hi-90'][-12:].values,
alpha=0.4, label='level 90')
plt.grid()
plt.legend()
plt.plot()
else:
plot_df = plot_df[plot_df.unique_id=='Airline2'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['PatchTST'], c='blue', label='Forecast')
plt.legend()
plt.grid()