TemporalNorm
Temporal normalization has proven to be essential in neural forecasting tasks, as it enables network’s non-linearities to express themselves. Forecasting scaling methods take particular interest in the temporal dimension where most of the variance dwells, contrary to other deep learning techniques like BatchNorm
that normalizes across batch and temporal dimensions, and LayerNorm
that normalizes across the feature dimension. Currently we support the following techniques: std
, median
, norm
, norm1
, invariant
, revin
.
References
- Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski (2023). “HINT: Hierarchical Mixture Networks For Coherent Probabilistic Forecasting”. Neural Information Processing Systems, submitted. Working Paper version available at arxiv.
- Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang-Ho Choi and Jaegul Choo. “Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift”. ICLR 2022.
- David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski (2020). “DeepAR: Probabilistic forecasting with autoregressive recurrent networks”. International Journal of Forecasting.
1. Auxiliary Functions
masked_median
*Masked Median
Compute the median of tensor x
along dim, ignoring values where mask
is False. x
and mask
need to be broadcastable.
Parameters:
x
: torch.Tensor to compute median of along dim
dimension.
mask
: torch Tensor bool with same shape as x
, where
x
is valid and False where x
should be masked. Mask should not be
all False in any column of dimension dim to avoid NaNs from zero
division.
dim
(int, optional): Dimension to take median of.
Defaults to -1.
keepdim
(bool, optional): Keep dimension of x
or
not. Defaults to True.
Returns:
x_median
: torch.Tensor with normalized values.*
masked_mean
*Masked Mean
Compute the mean of tensor x
along dimension, ignoring values where
mask
is False. x
and mask
need to be broadcastable.
Parameters:
x
: torch.Tensor to compute mean of along dim
dimension.
mask
: torch Tensor bool with same shape as x
, where
x
is valid and False where x
should be masked. Mask should not be
all False in any column of dimension dim to avoid NaNs from zero
division.
dim
(int, optional): Dimension to take mean of. Defaults
to -1.
keepdim
(bool, optional): Keep dimension of x
or not.
Defaults to True.
Returns:
x_mean
: torch.Tensor with normalized values.*
2. Scalers
minmax_statistics
*MinMax Scaler
Standardizes temporal features by ensuring its range dweels between [0,1] range. This transformation is often used as an alternative to the standard scaler. The scaled features are obtained as:
Parameters:
x
: torch.Tensor input tensor.
mask
: torch
Tensor bool, same dimension as x
, indicates where x
is valid and
False where x
should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.
eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.
dim
(int, optional): Dimension over to compute min and max.
Defaults to -1.
Returns:
z
: torch.Tensor same shape as x
, except scaled.*
minmax1_statistics
*MinMax1 Scaler
Standardizes temporal features by ensuring its range dweels between [-1,1] range. This transformation is often used as an alternative to the standard scaler or classic Min Max Scaler. The scaled features are obtained as:
Parameters:
x
: torch.Tensor input tensor.
mask
: torch
Tensor bool, same dimension as x
, indicates where x
is valid and
False where x
should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.
eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.
dim
(int, optional): Dimension over to compute min and max.
Defaults to -1.
Returns:
z
: torch.Tensor same shape as x
, except scaled.*
std_statistics
*Standard Scaler
Standardizes features by removing the mean and scaling to unit variance
along the dim
dimension.
For example, for base_windows
models, the scaled features are obtained
as (with dim=1):
Parameters:
x
: torch.Tensor.
mask
: torch Tensor bool,
same dimension as x
, indicates where x
is valid and False where x
should be masked. Mask should not be all False in any column of
dimension dim to avoid NaNs from zero division.
eps
(float,
optional): Small value to avoid division by zero. Defaults to 1e-6.
dim
(int, optional): Dimension over to compute mean and std. Defaults
to -1.
Returns:
z
: torch.Tensor same shape as x
, except scaled.*
robust_statistics
*Robust Median Scaler
Standardizes features by removing the median and scaling with the mean absolute deviation (mad) a robust estimator of variance. This scaler is particularly useful with noisy data where outliers can heavily influence the sample mean / variance in a negative way. In these scenarios the median and amd give better results.
For example, for base_windows
models, the scaled features are obtained
as (with dim=1):
Parameters:
x
: torch.Tensor input tensor.
mask
: torch
Tensor bool, same dimension as x
, indicates where x
is valid and
False where x
should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.
eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.
dim
(int, optional): Dimension over to compute median and
mad. Defaults to -1.
Returns:
z
: torch.Tensor same shape as x
, except scaled.*
invariant_statistics
*Invariant Median Scaler
Standardizes features by removing the median and scaling with the mean absolute deviation (mad) a robust estimator of variance. Aditionally it complements the transformation with the arcsinh transformation.
For example, for base_windows
models, the scaled features are obtained
as (with dim=1):
Parameters:
x
: torch.Tensor input tensor.
mask
: torch
Tensor bool, same dimension as x
, indicates where x
is valid and
False where x
should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.
eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.
dim
(int, optional): Dimension over to compute median and
mad. Defaults to -1.
Returns:
z
: torch.Tensor same shape as x
, except scaled.*
identity_statistics
*Identity Scaler
A placeholder identity scaler, that is argument insensitive.
Parameters:
x
: torch.Tensor input tensor.
mask
: torch
Tensor bool, same dimension as x
, indicates where x
is valid and
False where x
should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.
eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.
dim
(int, optional): Dimension over to compute median and
mad. Defaults to -1.
Returns:
x
: original torch.Tensor x
.*
3. TemporalNorm Module
TemporalNorm
*Temporal Normalization
Standardization of the features is a common requirement for many machine
learning estimators, and it is commonly achieved by removing the level
and scaling its variance. The TemporalNorm
module applies temporal
normalization over the batch of inputs as defined by the type of scaler.
If scaler_type
is revin
learnable normalization parameters are added
on top of the usual normalization technique, the parameters are learned
through scale decouple global skip connections. The technique is
available for point and probabilistic outputs.
Parameters:
scaler_type
: str, defines the type of scaler used
by TemporalNorm. Available [identity
, standard
, robust
, minmax
,
minmax1
, invariant
, revin
].
dim
(int, optional): Dimension
over to compute scale and shift. Defaults to -1.
eps
(float,
optional): Small value to avoid division by zero. Defaults to 1e-6.
num_features
: int=None, for RevIN-like learnable affine parameters
initialization.
TemporalNorm.transform
*Center and scale the data.
Parameters:
x
: torch.Tensor shape [batch, time,
channels].
mask
: torch Tensor bool, shape [batch, time] where
x
is valid and False where x
should be masked. Mask should not be
all False in any column of dimension dim to avoid NaNs from zero
division.
Returns:
z
: torch.Tensor same shape as x
, except scaled.*
TemporalNorm.inverse_transform
*Scale back the data to the original representation.
Parameters:
z
: torch.Tensor shape [batch, time, channels],
scaled.
Returns:
x
: torch.Tensor original data.*