PyTorch Losses
NeuralForecast contains a collection PyTorch Loss classes aimed to be used during the models’ optimization.
The most important train signal is the forecast error, which is the difference between the observed value and the prediction , at time :
The train loss summarizes the forecast errors in different train optimization objectives.
All the losses are torch.nn.modules
which helps to automatically moved
them across CPU/GPU/TPU devices with Pytorch Lightning.
source
BasePointLoss
*Base class for point loss functions.
Parameters:
horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window.
outputsize_multiplier
:
Multiplier for the output size.
output_names
: Names of the
outputs.
*
1. Scale-dependent Errors
These metrics are on the same scale as the data.
Mean Absolute Error (MAE)
source
MAE.__init__
*Mean Absolute Error
Calculates Mean Absolute Error between y
and y_hat
. MAE measures the
relative prediction accuracy of a forecasting method by calculating the
deviation of the prediction and the true value at a given time and
averages these devations over the length of the series.
Parameters:
horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window.
*
source
MAE.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies datapoints to consider
in loss.
Returns:
mae
:
tensor (single value).*
Mean Squared Error (MSE)
source
MSE.__init__
*Mean Squared Error
Calculates Mean Squared Error between y
and y_hat
. MSE measures the
relative prediction accuracy of a forecasting method by calculating the
squared deviation of the prediction and the true value at a given time,
and averages these devations over the length of the series.
Parameters:
horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window.
*
source
MSE.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies datapoints to consider
in loss.
Returns:
mse
:
tensor (single value).*
Root Mean Squared Error (RMSE)
source
RMSE.__init__
*Root Mean Squared Error
Calculates Root Mean Squared Error between y
and y_hat
. RMSE
measures the relative prediction accuracy of a forecasting method by
calculating the squared deviation of the prediction and the observed
value at a given time and averages these devations over the length of
the series. Finally the RMSE will be in the same scale as the original
time series so its comparison with other series is possible only if they
share a common scale. RMSE has a direct connection to the L2 norm.
Parameters:
horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window.
*
source
RMSE.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies datapoints to consider
in loss.
Returns:
rmse
:
tensor (single value).*
2. Percentage errors
These metrics are unit-free, suitable for comparisons across series.
Mean Absolute Percentage Error (MAPE)
source
MAPE.__init__
*Mean Absolute Percentage Error
Calculates Mean Absolute Percentage Error between y
and y_hat
. MAPE
measures the relative prediction accuracy of a forecasting method by
calculating the percentual deviation of the prediction and the observed
value at a given time and averages these devations over the length of
the series. The closer to zero an observed value is, the higher penalty
MAPE loss assigns to the corresponding error.
Parameters:
horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window.
References:
Makridakis S., “Accuracy measures: theoretical and
practical
concerns”.*
source
MAPE.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per serie to
consider in loss.
Returns:
mape
:
tensor (single value).*
Symmetric MAPE (sMAPE)
source
SMAPE.__init__
*Symmetric Mean Absolute Percentage Error
Calculates Symmetric Mean Absolute Percentage Error between y
and
y_hat
. SMAPE measures the relative prediction accuracy of a
forecasting method by calculating the relative deviation of the
prediction and the observed value scaled by the sum of the absolute
values for the prediction and observed value at a given time, then
averages these devations over the length of the series. This allows the
SMAPE to have bounds between 0% and 200% which is desireble compared to
normal MAPE that may be undetermined when the target is zero.
Parameters:
horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window.
References:
Makridakis S., “Accuracy measures: theoretical and
practical
concerns”.*
source
SMAPE.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per serie to
consider in loss.
Returns:
smape
:
tensor (single value).*
3. Scale-independent Errors
These metrics measure the relative improvements versus baselines.
Mean Absolute Scaled Error (MASE)
source
MASE.__init__
*Mean Absolute Scaled Error Calculates the Mean Absolute Scaled Error
between y
and y_hat
. MASE measures the relative prediction accuracy
of a forecasting method by comparinng the mean absolute errors of the
prediction and the observed value against the mean absolute errors of
the seasonal naive model. The MASE partially composed the Overall
Weighted Average (OWA), used in the M4 Competition.
Parameters:
seasonality
: int. Main frequency of the time
series; Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4,
Yearly 1. horizon_weight
: Tensor of size h, weight for each timestamp
of the forecasting window.
References:
Rob J. Hyndman, & Koehler, A. B. “Another look at
measures of forecast
accuracy”.
Spyros Makridakis, Evangelos Spiliotis, Vassilios Assimakopoulos, “The
M4 Competition: 100,000 time series and 61 forecasting
methods”.*
source
MASE.__call__
*Parameters:
y
: tensor (batch_size, output_size), Actual
values.
y_hat
: tensor (batch_size, output_size)), Predicted
values.
y_insample
: tensor (batch_size, input_size), Actual
insample Seasonal Naive predictions.
mask
: tensor, Specifies date
stamps per serie to consider in loss.
Returns:
mase
:
tensor (single value).*
Relative Mean Squared Error (relMSE)
source
relMSE.__init__
*Relative Mean Squared Error Computes Relative Mean Squared Error (relMSE), as proposed by Hyndman & Koehler (2006) as an alternative to percentage errors, to avoid measure unstability.
Parameters:
y_train
: numpy array, Training values.
horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window.
References:
- Hyndman, R. J and Koehler, A. B. (2006). “Another
look at measures of forecast accuracy”, International Journal of
Forecasting, Volume 22, Issue
4.
-
Kin G. Olivares, O. Nganba Meetei, Ruijun Ma, Rohan Reddy, Mengfei Cao,
Lee Dicker. “Probabilistic Hierarchical Forecasting with Deep Poisson
Mixtures. Submitted to the International Journal Forecasting, Working
paper available at arxiv.*
source
relMSE.__call__
*Parameters:
y
: tensor (batch_size, output_size), Actual
values.
y_hat
: tensor (batch_size, output_size)), Predicted
values.
y_insample
: tensor (batch_size, input_size), Actual
insample Seasonal Naive predictions.
mask
: tensor, Specifies date
stamps per serie to consider in loss.
Returns:
relMSE
:
tensor (single value).*
4. Probabilistic Errors
These methods use statistical approaches for estimating unknown probability distributions using observed data.
Maximum likelihood estimation involves finding the parameter values that maximize the likelihood function, which measures the probability of obtaining the observed data given the parameter values. MLE has good theoretical properties and efficiency under certain satisfied assumptions.
On the non-parametric approach, quantile regression measures non-symmetrically deviation, producing under/over estimation.
Quantile Loss
source
QuantileLoss.__init__
*Quantile Loss
Computes the quantile loss between y
and y_hat
. QL measures the
deviation of a quantile forecast. By weighting the absolute deviation in
a non symmetric way, the loss pays more attention to under or over
estimation. A common value for q is 0.5 for the deviation from the
median (Pinball loss).
Parameters:
q
: float, between 0 and 1. The slope of the
quantile loss, in the context of quantile regression, the q determines
the conditional quantile level.
horizon_weight
: Tensor of size h,
weight for each timestamp of the forecasting window.
References:
Roger Koenker and Gilbert Bassett, Jr., “Regression
Quantiles”.*
source
QuantileLoss.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies datapoints to consider
in loss.
Returns:
quantile_loss
:
tensor (single value).*
Multi Quantile Loss (MQLoss)
source
MQLoss.__init__
*Multi-Quantile loss
Calculates the Multi-Quantile loss (MQL) between y
and y_hat
. MQL
calculates the average multi-quantile Loss for a given set of quantiles,
based on the absolute difference between predicted quantiles and
observed values.
The limit behavior of MQL allows to measure the accuracy of a full predictive distribution with the continuous ranked probability score (CRPS). This can be achieved through a numerical integration technique, that discretizes the quantiles and treats the CRPS integral with a left Riemann approximation, averaging over uniformly distanced quantiles.
Parameters:
level
: int list [0,100]. Probability levels for
prediction intervals (Defaults median). quantiles
: float list [0.,
1.]. Alternative to level, quantiles to estimate from y distribution.
horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window.
References:
Roger Koenker and Gilbert Bassett, Jr., “Regression
Quantiles”.
James E.
Matheson and Robert L. Winkler, “Scoring Rules for Continuous
Probability Distributions”.*
source
MQLoss.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per serie to
consider in loss.
Returns:
mqloss
:
tensor (single value).*
Implicit Quantile Loss (IQLoss)
source
QuantileLayer
*Implicit Quantile Layer from the paper
IQN for Distributional Reinforcement Learning
(https://arxiv.org/abs/1806.06923) by Dabney et al. 2018.
Code from GluonTS: https://github.com/awslabs/gluonts/blob/61133ef6e2d88177b32ace4afc6843ab9a7bc8cd/src/gluonts/torch/distributions/implicit_quantile_network.py\*
source
IQLoss.__init__
*Implicit Quantile Loss
Computes the quantile loss between y
and y_hat
, with the quantile
q
provided as an input to the network. IQL measures the deviation of a
quantile forecast. By weighting the absolute deviation in a non
symmetric way, the loss pays more attention to under or over estimation.
Parameters:
quantile_sampling
: str, default=‘uniform’,
sampling distribution used to sample the quantiles during training.
Choose from [‘uniform’, ‘beta’].
horizon_weight
: Tensor of size
h, weight for each timestamp of the forecasting window.
source
IQLoss.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies datapoints to consider
in loss.
Returns:
quantile_loss
:
tensor (single value).*
DistributionLoss
source
DistributionLoss.__init__
*DistributionLoss
This PyTorch module wraps the torch.distribution
classes allowing it
to interact with NeuralForecast models modularly. It shares the negative
log-likelihood as the optimization objective and a sample method to
generate empirically the quantiles defined by the level
list.
Additionally, it implements a distribution transformation that factorizes the scale-dependent likelihood parameters into a base scale and a multiplier efficiently learnable within the network’s non-linearities operating ranges.
Available distributions:
- Poisson
- Normal
- StudentT
-
NegativeBinomial
- Tweedie
- Bernoulli (Temporal
Classifiers)
- ISQF (Incremental Spline Quantile Function)
Parameters:
distribution
: str, identifier of a
torch.distributions.Distribution class.
level
: float list
[0,100], confidence levels for prediction intervals.
quantiles
:
float list [0,1], alternative to level list, target quantiles.
num_samples
: int=500, number of samples for the empirical
quantiles.
return_params
: bool=False, wether or not return the
Distribution parameters.
References:
- PyTorch Probability Distributions Package:
StudentT.
-
David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski
(2020). “DeepAR: Probabilistic forecasting with autoregressive recurrent
networks”. International Journal of
Forecasting.
-
Park, Youngsuk, Danielle Maddix, François-Xavier Aubet, Kelvin Kan, Jan
Gasthaus, and Yuyang Wang (2022). “Learning Quantile Functions without
Quantile Crossing for Distribution-free Time Series
Forecasting”.*
source
DistributionLoss.sample
*Construct the empirical quantiles from the estimated Distribution,
sampling from it num_samples
independently.
Parameters
distr_args
: Constructor arguments for the
underlying Distribution type.
num_samples
: int=500, overwrite
number of samples for the empirical quantiles.
Returns
samples
: tensor, shape [B,H,num_samples
].
quantiles
: tensor, empirical quantiles defined by levels
.
*
source
DistributionLoss.__call__
*Computes the negative log-likelihood objective function. To estimate the following predictive distribution:
where represents the distributions parameters. It aditionally
summarizes the objective signal using a weighted average using the
mask
tensor.
Parameters
y
: tensor, Actual values.
distr_args
:
Constructor arguments for the underlying Distribution type.
loc
:
Optional tensor, of the same shape as the batch_shape + event_shape of
the resulting distribution.
scale
: Optional tensor, of the same
shape as the batch_shape+event_shape of the resulting distribution.
mask
: tensor, Specifies date stamps per serie to consider in loss.
Returns
loss
: scalar, weighted loss function against which
backpropagation will be performed.
*
Poisson Mixture Mesh (PMM)
source
PMM.__init__
*Poisson Mixture Mesh
This Poisson Mixture statistical model assumes independence across groups of data , and estimates relationships within the group.
Parameters:
n_components
: int=10, the number of mixture
components.
level
: float list [0,100], confidence levels for
prediction intervals.
quantiles
: float list [0,1], alternative
to level list, target quantiles.
return_params
: bool=False, wether
or not return the Distribution parameters.
batch_correlation
:
bool=False, wether or not model batch correlations.
horizon_correlation
: bool=False, wether or not model horizon
correlations.
source
PMM.sample
*Construct the empirical quantiles from the estimated Distribution,
sampling from it num_samples
independently.
Parameters
distr_args
: Constructor arguments for the
underlying Distribution type.
loc
: Optional tensor, of the same
shape as the batch_shape + event_shape of the resulting
distribution.
scale
: Optional tensor, of the same shape as the
batch_shape+event_shape of the resulting distribution.
num_samples
: int=500, overwrites number of samples for the empirical
quantiles.
Returns
samples
: tensor, shape [B,H,num_samples
].
quantiles
: tensor, empirical quantiles defined by levels
.
*
source
PMM.__call__
Call self as a function.
Gaussian Mixture Mesh (GMM)
source
GMM.__init__
*Gaussian Mixture Mesh
This Gaussian Mixture statistical model assumes independence across groups of data , and estimates relationships within the group.
Parameters:
n_components
: int=10, the number of mixture
components.
level
: float list [0,100], confidence levels for
prediction intervals.
quantiles
: float list [0,1], alternative
to level list, target quantiles.
return_params
: bool=False, wether
or not return the Distribution parameters.
batch_correlation
:
bool=False, wether or not model batch correlations.
horizon_correlation
: bool=False, wether or not model horizon
correlations.
source
GMM.sample
*Construct the empirical quantiles from the estimated Distribution,
sampling from it num_samples
independently.
Parameters
distr_args
: Constructor arguments for the
underlying Distribution type.
loc
: Optional tensor, of the same
shape as the batch_shape + event_shape of the resulting
distribution.
scale
: Optional tensor, of the same shape as the
batch_shape+event_shape of the resulting distribution.
num_samples
: int=500, number of samples for the empirical
quantiles.
Returns
samples
: tensor, shape [B,H,num_samples
].
quantiles
: tensor, empirical quantiles defined by levels
.
*
source
GMM.__call__
Call self as a function.
Negative Binomial Mixture Mesh (NBMM)
source
NBMM.__init__
*Negative Binomial Mixture Mesh
This N. Binomial Mixture statistical model assumes independence across groups of data , and estimates relationships within the group.
Parameters:
n_components
: int=10, the number of mixture
components.
level
: float list [0,100], confidence levels for
prediction intervals.
quantiles
: float list [0,1], alternative
to level list, target quantiles.
return_params
: bool=False, wether
or not return the Distribution parameters.
source
NBMM.sample
*Construct the empirical quantiles from the estimated Distribution,
sampling from it num_samples
independently.
Parameters
distr_args
: Constructor arguments for the
underlying Distribution type.
loc
: Optional tensor, of the same
shape as the batch_shape + event_shape of the resulting
distribution.
scale
: Optional tensor, of the same shape as the
batch_shape+event_shape of the resulting distribution.
num_samples
: int=500, number of samples for the empirical
quantiles.
Returns
samples
: tensor, shape [B,H,num_samples
].
quantiles
: tensor, empirical quantiles defined by levels
.
*
source
NBMM.__call__
Call self as a function.
5. Robustified Errors
This type of errors from robust statistic focus on methods resistant to outliers and violations of assumptions, providing reliable estimates and inferences. Robust estimators are used to reduce the impact of outliers, offering more stable results.
Huber Loss
source
HuberLoss.__init__
*Huber Loss
The Huber loss, employed in robust regression, is a loss function that exhibits reduced sensitivity to outliers in data when compared to the squared error loss. This function is also refered as SmoothL1.
The Huber loss function is quadratic for small errors and linear for large errors, with equal values and slopes of the different sections at the two points where =.
where is a threshold parameter that determines the point at which the loss transitions from quadratic to linear, and can be tuned to control the trade-off between robustness and accuracy in the predictions.
Parameters:
delta
: float=1.0, Specifies the threshold at which
to change between delta-scaled L1 and L2 loss. horizon_weight
: Tensor
of size h, weight for each timestamp of the forecasting window.
References:
Huber Peter, J (1964). “Robust Estimation of a
Location Parameter”. Annals of
Statistics*
source
HuberLoss.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per serie to
consider in loss.
Returns:
huber_loss
: tensor (single value).*
Tukey Loss
source
TukeyLoss.__init__
*Tukey Loss
The Tukey loss function, also known as Tukey’s biweight function, is a robust statistical loss function used in robust statistics. Tukey’s loss exhibits quadratic behavior near the origin, like the Huber loss; however, it is even more robust to outliers as the loss for large residuals remains constant instead of scaling linearly.
The parameter in Tukey’s loss determines the ‘’saturation’’ point of the function: Higher values of enhance sensitivity, while lower values increase resistance to outliers.
Please note that the Tukey loss function assumes the data to be stationary or normalized beforehand. If the error values are excessively large, the algorithm may need help to converge during optimization. It is advisable to employ small learning rates.
Parameters:
c
: float=4.685, Specifies the Tukey loss’
threshold on which residuals are no longer considered.
normalize
:
bool=True, Wether normalization is performed within Tukey loss’
computation.
source
TukeyLoss.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per serie to
consider in loss.
Returns:
tukey_loss
: tensor (single value).*
Huberized Quantile Loss
source
HuberQLoss.__init__
*Huberized Quantile Loss
The Huberized quantile loss is a modified version of the quantile loss function that combines the advantages of the quantile loss and the Huber loss. It is commonly used in regression tasks, especially when dealing with data that contains outliers or heavy tails.
The Huberized quantile loss between y
and y_hat
measure the Huber
Loss in a non-symmetric way. The loss pays more attention to
under/over-estimation depending on the quantile parameter ; and
controls the trade-off between robustness and accuracy in the
predictions with the parameter .
Parameters:
delta
: float=1.0, Specifies the threshold at which
to change between delta-scaled L1 and L2 loss.
q
: float, between 0
and 1. The slope of the quantile loss, in the context of quantile
regression, the q determines the conditional quantile level.
horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window.
References:
Huber Peter, J (1964). “Robust Estimation of a
Location Parameter”. Annals of
Statistics
Roger Koenker and Gilbert Bassett, Jr., “Regression
Quantiles”.*
source
HuberQLoss.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies datapoints to consider
in loss.
Returns:
huber_qloss
: tensor (single value).*
Huberized MQLoss
source
HuberMQLoss.__init__
*Huberized Multi-Quantile loss
The Huberized Multi-Quantile loss (HuberMQL) is a modified version of the multi-quantile loss function that combines the advantages of the quantile loss and the Huber loss. HuberMQL is commonly used in regression tasks, especially when dealing with data that contains outliers or heavy tails. The loss function pays more attention to under/over-estimation depending on the quantile list parameter. It controls the trade-off between robustness and prediction accuracy with the parameter .
Parameters:
level
: int list [0,100]. Probability levels for
prediction intervals (Defaults median). quantiles
: float list [0.,
1.]. Alternative to level, quantiles to estimate from y distribution.
delta
: float=1.0, Specifies the threshold at which to change between
delta-scaled L1 and L2 loss.
horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window.
References:
Huber Peter, J (1964). “Robust Estimation of a
Location Parameter”. Annals of
Statistics
Roger Koenker and Gilbert Bassett, Jr., “Regression
Quantiles”.*
source
HuberMQLoss.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per serie to
consider in loss.
Returns:
hmqloss
: tensor (single value).*
6. Others
Accuracy
source
Accuracy.__init__
*Accuracy
Computes the accuracy between categorical y
and y_hat
. This
evaluation metric is only meant for evalution, as it is not
differentiable.
*
source
Accuracy.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per serie to
consider in loss.
Returns:
accuracy
: tensor (single value).*
Scaled Continuous Ranked Probability Score (sCRPS)
source
sCRPS.__init__
*Scaled Continues Ranked Probability Score
Calculates a scaled variation of the CRPS, as proposed by Rangapuram
(2021), to measure the accuracy of predicted quantiles y_hat
compared
to the observation y
.
This metric averages percentual weighted absolute deviations as defined by the quantile losses.
where is the estimated quantile, and are the target variable realizations.
Parameters:
level
: int list [0,100]. Probability levels for
prediction intervals (Defaults median). quantiles
: float list [0.,
1.]. Alternative to level, quantiles to estimate from y distribution.
References:
- Gneiting, Tilmann. (2011). “Quantiles as optimal
point forecasts”. International Journal of
Forecasting.
-
Spyros Makridakis, Evangelos Spiliotis, Vassilios Assimakopoulos, Zhi
Chen, Anil Gaba, Ilia Tsetlin, Robert L. Winkler. (2022). “The M5
uncertainty competition: Results, findings and conclusions”.
International Journal of
Forecasting.
-
Syama Sundar Rangapuram, Lucien D Werner, Konstantinos Benidis, Pedro
Mercado, Jan Gasthaus, Tim Januschowski. (2021). “End-to-End Learning of
Coherent Probabilistic Forecasts for Hierarchical Time Series”.
Proceedings of the 38th International Conference on Machine Learning
(ICML).*
source
sCRPS.__call__
*Parameters:
y
: tensor, Actual values.
y_hat
: tensor,
Predicted values.
mask
: tensor, Specifies date stamps per series
to consider in loss.
Returns:
scrps
: tensor (single value).*