References
- Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski (2023). โHINT: Hierarchical Mixture Networks For Coherent Probabilistic Forecastingโ. Neural Information Processing Systems, submitted. Working Paper version available at arxiv.
- Taesung Kim and Jinhee Kim and Yunwon Tae and Cheonbok Park and Jang-Ho Choi and Jaegul Choo. โReversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shiftโ. ICLR 2022.
- David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski (2020). โDeepAR: Probabilistic forecasting with autoregressive recurrent networksโ. International Journal of Forecasting.
1. Auxiliary Functions
masked_median
*Masked Median Compute the median of tensor
x along dim, ignoring values where mask
is False. x and mask need to be broadcastable.
Parameters:x: torch.Tensor to compute median of along dim
dimension.mask: torch Tensor bool with same shape as x, where
x is valid and False where x should be masked. Mask should not be
all False in any column of dimension dim to avoid NaNs from zero
division.dim (int, optional): Dimension to take median of.
Defaults to -1.keepdim (bool, optional): Keep dimension of x or
not. Defaults to True.Returns:
x_median: torch.Tensor with normalized values.*
masked_mean
*Masked Mean Compute the mean of tensor
x along dimension, ignoring values where
mask is False. x and mask need to be broadcastable.
Parameters:x: torch.Tensor to compute mean of along dim
dimension.mask: torch Tensor bool with same shape as x, where
x is valid and False where x should be masked. Mask should not be
all False in any column of dimension dim to avoid NaNs from zero
division.dim (int, optional): Dimension to take mean of. Defaults
to -1.keepdim (bool, optional): Keep dimension of x or not.
Defaults to True.Returns:
x_mean: torch.Tensor with normalized values.*
2. Scalers
minmax_statistics
*MinMax Scaler Standardizes temporal features by ensuring its range dweels between [0,1] range. This transformation is often used as an alternative to the standard scaler. The scaled features are obtained as: Parameters:
x: torch.Tensor input tensor.mask: torch
Tensor bool, same dimension as x, indicates where x is valid and
False where x should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.dim (int, optional): Dimension over to compute min and max.
Defaults to -1.Returns:
z: torch.Tensor same shape as x, except scaled.*
minmax1_statistics
*MinMax1 Scaler Standardizes temporal features by ensuring its range dweels between [-1,1] range. This transformation is often used as an alternative to the standard scaler or classic Min Max Scaler. The scaled features are obtained as: Parameters:
x: torch.Tensor input tensor.mask: torch
Tensor bool, same dimension as x, indicates where x is valid and
False where x should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.dim (int, optional): Dimension over to compute min and max.
Defaults to -1.Returns:
z: torch.Tensor same shape as x, except scaled.*
std_statistics
*Standard Scaler Standardizes features by removing the mean and scaling to unit variance along the
dim dimension.
For example, for base_windows models, the scaled features are obtained
as (with dim=1):
Parameters:x: torch.Tensor.mask: torch Tensor bool,
same dimension as x, indicates where x is valid and False where x
should be masked. Mask should not be all False in any column of
dimension dim to avoid NaNs from zero division.eps (float,
optional): Small value to avoid division by zero. Defaults to 1e-6.dim (int, optional): Dimension over to compute mean and std. Defaults
to -1.Returns:
z: torch.Tensor same shape as x, except scaled.*
robust_statistics
*Robust Median Scaler Standardizes features by removing the median and scaling with the mean absolute deviation (mad) a robust estimator of variance. This scaler is particularly useful with noisy data where outliers can heavily influence the sample mean / variance in a negative way. In these scenarios the median and amd give better results. For example, for
base_windows models, the scaled features are obtained
as (with dim=1):
Parameters:x: torch.Tensor input tensor.mask: torch
Tensor bool, same dimension as x, indicates where x is valid and
False where x should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.dim (int, optional): Dimension over to compute median and
mad. Defaults to -1.Returns:
z: torch.Tensor same shape as x, except scaled.*
invariant_statistics
*Invariant Median Scaler Standardizes features by removing the median and scaling with the mean absolute deviation (mad) a robust estimator of variance. Aditionally it complements the transformation with the arcsinh transformation. For example, for
base_windows models, the scaled features are obtained
as (with dim=1):
Parameters:x: torch.Tensor input tensor.mask: torch
Tensor bool, same dimension as x, indicates where x is valid and
False where x should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.dim (int, optional): Dimension over to compute median and
mad. Defaults to -1.Returns:
z: torch.Tensor same shape as x, except scaled.*
identity_statistics
*Identity Scaler A placeholder identity scaler, that is argument insensitive. Parameters:
x: torch.Tensor input tensor.mask: torch
Tensor bool, same dimension as x, indicates where x is valid and
False where x should be masked. Mask should not be all False in any
column of dimension dim to avoid NaNs from zero division.eps
(float, optional): Small value to avoid division by zero. Defaults to
1e-6.dim (int, optional): Dimension over to compute median and
mad. Defaults to -1.Returns:
x: original torch.Tensor x.*
3. TemporalNorm Module
TemporalNorm
*Temporal Normalization Standardization of the features is a common requirement for many machine learning estimators, and it is commonly achieved by removing the level and scaling its variance. The
TemporalNorm module applies temporal
normalization over the batch of inputs as defined by the type of scaler.
If scaler_type is revin learnable normalization parameters are added
on top of the usual normalization technique, the parameters are learned
through scale decouple global skip connections. The technique is
available for point and probabilistic outputs.
Parameters:scaler_type: str, defines the type of scaler used
by TemporalNorm. Available [identity, standard, robust, minmax,
minmax1, invariant, revin].dim (int, optional): Dimension
over to compute scale and shift. Defaults to -1.eps (float,
optional): Small value to avoid division by zero. Defaults to 1e-6.num_features: int=None, for RevIN-like learnable affine parameters
initialization.References
- Kin G. Olivares, David Luo, Cristian Challu, Stefania La Vattiata, Max Mergenthaler, Artur Dubrawski (2023). โHINT: Hierarchical Mixture Networks For Coherent Probabilistic Forecastingโ. Neural Information Processing Systems, submitted. Working Paper version available at arxiv.
*
TemporalNorm.transform
*Center and scale the data. Parameters:
x: torch.Tensor shape [batch, time,
channels].mask: torch Tensor bool, shape [batch, time] where
x is valid and False where x should be masked. Mask should not be
all False in any column of dimension dim to avoid NaNs from zero
division.Returns:
z: torch.Tensor same shape as x, except scaled.*
TemporalNorm.inverse_transform
*Scale back the data to the original representation. Parameters:
z: torch.Tensor shape [batch, time, channels],
scaled.Returns:
x: torch.Tensor original data.*

