module neuralforecast.models.vanillatransformer
class VanillaTransformer
VanillaTransformer
Vanilla Transformer, following implementation of the Informer paper, used as baseline.
The architecture has three distinctive features:
- Full-attention mechanism with O(L^2) time and memory complexity.
- An MLP multi-step decoder that predicts long time-series sequences in a single forward operation rather than step-by-step.
- It employs encoded autoregressive features obtained from a convolution network.
- It uses window-relative positional embeddings derived from harmonic functions.
- Absolute positional embeddings obtained from calendar features are utilized.
h(int): forecast horizon.input_size(int): maximum sequence length for truncated train backpropagation.stat_exog_list(str list): static exogenous columns.hist_exog_list(str list): historic exogenous columns.futr_exog_list(str list): future exogenous columns.exclude_insample_y(bool): whether to exclude the target variable from the input.decoder_input_size_multiplier(float): .hidden_size(int): units of embeddings and encoders.dropout(float): dropout throughout Informer architecture.n_head(int): controls number of multi-head’s attention.conv_hidden_size(int): channels of the convolutional encoder.activation(str): activation from [‘ReLU’, ‘Softplus’, ‘Tanh’, ‘SELU’, ‘LeakyReLU’, ‘PReLU’, ‘Sigmoid’, ‘GELU’].encoder_layers(int): number of layers for the TCN encoder.decoder_layers(int): number of layers for the MLP decoder.loss(PyTorch module): instantiated train loss class from losses collection.valid_loss(PyTorch module): instantiated valid loss class from losses collection.max_steps(int): maximum number of training steps.learning_rate(float): Learning rate between (0, 1).num_lr_decays(int): Number of learning rate decays, evenly distributed across max_steps.early_stop_patience_steps(int): Number of validation iterations before early stopping.val_check_steps(int): Number of training steps between every validation loss check.batch_size(int): number of different series in each batch.valid_batch_size(int): number of different series in each validation and test batch, if None uses batch_size.windows_batch_size(int): number of windows to sample in each training batch, default uses all.inference_windows_batch_size(int): number of windows to sample in each inference batch.start_padding_enabled(bool): if True, the model will pad the time series with zeros at the beginning, by input size.training_data_availability_threshold(Union[float, List[float]]): minimum fraction of valid data points required for training windows. Single float applies to both insample and outsample; list of two floats specifies [insample_fraction, outsample_fraction]. Default 0.0 allows windows with only 1 valid data point (current behavior).step_size(int): step size between each window of temporal data.scaler_type(str): type of scaler for temporal inputs normalization see temporal scalers.random_seed(int): random_seed for pytorch initializer and numpy generators.drop_last_loader(bool): if TrueTimeSeriesDataLoaderdrops last non-full batch.alias(str): optional, Custom name of the model.optimizer(Subclass of ‘torch.optim.Optimizer’): optional, user specified optimizer instead of the default choice (Adam).optimizer_kwargs(dict): optional, list of parameters used by the user specifiedoptimizer.lr_scheduler(Subclass of ‘torch.optim.lr_scheduler.LRScheduler’): optional, user specified lr_scheduler instead of the default choice (StepLR).lr_scheduler_kwargs(dict): optional, list of parameters used by the user specifiedlr_scheduler.dataloader_kwargs(dict): optional, list of parameters passed into the PyTorch Lightning dataloader by theTimeSeriesDataLoader.**trainer_kwargs (int): keyword trainer arguments inherited from PyTorch Lighning’s trainer.
method __init__
property automatic_optimization
If set toFalse you are responsible for calling .backward(), .step(), .zero_grad().
property current_epoch
The current epoch in theTrainer, or 0 if not attached.
property device
property device_mesh
Strategies likeModelParallelStrategy will create a device mesh that can be accessed in the :meth:~pytorch_lightning.core.hooks.ModelHooks.configure_model hook to parallelize the LightningModule.
property dtype
property example_input_array
The example input array is a specification of what the module can consume in the :meth:forward method. The return type is interpreted as follows:
- Single tensor: It is assumed the model takes a single argument, i.e.,
model.forward(model.example_input_array) - Tuple: The input array should be interpreted as a sequence of positional arguments, i.e.,
model.forward(*model.example_input_array) - Dict: The input array represents named keyword arguments, i.e.,
model.forward(**model.example_input_array)
property fabric
property global_rank
The index of the current process across all nodes and devices.property global_step
Total training batches seen across all epochs. If no Trainer is attached, this property is 0.property hparams
The collection of hyperparameters saved with :meth:save_hyperparameters. It is mutable by the user. For the frozen set of initial hyperparameters, use :attr:hparams_initial.
Returns:
Mutable hyperparameters dictionary
property hparams_initial
The collection of hyperparameters saved with :meth:save_hyperparameters. These contents are read-only. Manual updates to the saved hyperparameters can instead be performed through :attr:hparams.
Returns:
AttributeDict: immutable initial hyperparameters
property local_rank
The index of the current process within a single node.property logger
Reference to the logger object in the Trainer.property loggers
Reference to the list of loggers in the Trainer.property on_gpu
ReturnsTrue if this model is currently located on a GPU.
Useful to set flags around the LightningModule for different CPU vs GPU behavior.
property strict_loading
Determines how Lightning loads this model using.load_state_dict(..., strict=model.strict_loading).

