NeuralForecast contains a collection PyTorch Loss classes aimed to be used during the models’ optimization.
torch.nn.modules
which helps to automatically moved
them across CPU/GPU/TPU devices with Pytorch Lightning.
*Base class for point loss functions. Parameters:
horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window. outputsize_multiplier
:
Multiplier for the output size. output_names
: Names of the
outputs. *Mean Absolute Error Calculates Mean Absolute Error between
y
and y_hat
. MAE measures the
relative prediction accuracy of a forecasting method by calculating the
deviation of the prediction and the true value at a given time and
averages these devations over the length of the series.
Parameters:horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies datapoints to consider
in loss.mae
:
tensor (single value).*
*Mean Squared Error Calculates Mean Squared Error between
y
and y_hat
. MSE measures the
relative prediction accuracy of a forecasting method by calculating the
squared deviation of the prediction and the true value at a given time,
and averages these devations over the length of the series.
Parameters:horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies datapoints to consider
in loss.mse
:
tensor (single value).*
*Root Mean Squared Error Calculates Root Mean Squared Error between
y
and y_hat
. RMSE
measures the relative prediction accuracy of a forecasting method by
calculating the squared deviation of the prediction and the observed
value at a given time and averages these devations over the length of
the series. Finally the RMSE will be in the same scale as the original
time series so its comparison with other series is possible only if they
share a common scale. RMSE has a direct connection to the L2 norm.
Parameters:horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies datapoints to consider
in loss.rmse
:
tensor (single value).*
*Mean Absolute Percentage Error Calculates Mean Absolute Percentage Error between
y
and y_hat
. MAPE
measures the relative prediction accuracy of a forecasting method by
calculating the percentual deviation of the prediction and the observed
value at a given time and averages these devations over the length of
the series. The closer to zero an observed value is, the higher penalty
MAPE loss assigns to the corresponding error.
Parameters:horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.mape
:
tensor (single value).*
*Symmetric Mean Absolute Percentage Error Calculates Symmetric Mean Absolute Percentage Error between
y
and
y_hat
. SMAPE measures the relative prediction accuracy of a
forecasting method by calculating the relative deviation of the
prediction and the observed value scaled by the sum of the absolute
values for the prediction and observed value at a given time, then
averages these devations over the length of the series. This allows the
SMAPE to have bounds between 0% and 200% which is desireble compared to
normal MAPE that may be undetermined when the target is zero.
Parameters:horizon_weight
: Tensor of size h, weight for each
timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.smape
:
tensor (single value).*
*Mean Absolute Scaled Error Calculates the Mean Absolute Scaled Error between
y
and y_hat
. MASE measures the relative prediction accuracy
of a forecasting method by comparinng the mean absolute errors of the
prediction and the observed value against the mean absolute errors of
the seasonal naive model. The MASE partially composed the Overall
Weighted Average (OWA), used in the M4 Competition.
Parameters:seasonality
: int. Main frequency of the time
series; Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4,
Yearly 1. horizon_weight
: Tensor of size h, weight for each timestamp
of the forecasting window. *Parameters:
y
: tensor (batch_size, output_size), Actual
values.y_hat
: tensor (batch_size, output_size)), Predicted
values.y_insample
: tensor (batch_size, input_size), Actual
insample values.mask
: tensor, Specifies date stamps per serie to
consider in loss.mase
:
tensor (single value).*
*Relative Mean Squared Error Computes Relative Mean Squared Error (relMSE), as proposed by Hyndman & Koehler (2006) as an alternative to percentage errors, to avoid measure unstability. Parameters:
y_train
: numpy array, deprecated.horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window. *Parameters:
y
: tensor (batch_size, output_size), Actual
values.y_hat
: tensor (batch_size, output_size)), Predicted
values.y_benchmark
: tensor (batch_size, output_size), Benchmark
predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.relMSE
:
tensor (single value).*
*Quantile Loss Computes the quantile loss between
y
and y_hat
. QL measures the
deviation of a quantile forecast. By weighting the absolute deviation in
a non symmetric way, the loss pays more attention to under or over
estimation. A common value for q is 0.5 for the deviation from the
median (Pinball loss).
Parameters:q
: float, between 0 and 1. The slope of the
quantile loss, in the context of quantile regression, the q determines
the conditional quantile level.horizon_weight
: Tensor of size h,
weight for each timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies datapoints to consider
in loss.quantile_loss
:
tensor (single value).*
*Multi-Quantile loss Calculates the Multi-Quantile loss (MQL) between
y
and y_hat
. MQL
calculates the average multi-quantile Loss for a given set of quantiles,
based on the absolute difference between predicted quantiles and
observed values.
The limit behavior of MQL allows to measure the accuracy of a full
predictive distribution with the continuous
ranked probability score (CRPS). This can be achieved through a
numerical integration technique, that discretizes the quantiles and
treats the CRPS integral with a left Riemann approximation, averaging
over uniformly distanced quantiles.
Parameters:level
: int list [0,100]. Probability levels for
prediction intervals (Defaults median). quantiles
: float list [0.,
1.]. Alternative to level, quantiles to estimate from y distribution.
horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.mqloss
:
tensor (single value).*
*Implicit Quantile Layer from the paper
IQN for Distributional Reinforcement Learning
(https://arxiv.org/abs/1806.06923) by Dabney et al. 2018.
Code from GluonTS:
https://github.com/awslabs/gluonts/blob/61133ef6e2d88177b32ace4afc6843ab9a7bc8cd/src/gluonts/torch/distributions/implicit_quantile_network.py\*
*Implicit Quantile Loss Computes the quantile loss between
y
and y_hat
, with the quantile
q
provided as an input to the network. IQL measures the deviation of a
quantile forecast. By weighting the absolute deviation in a non
symmetric way, the loss pays more attention to under or over estimation.
Parameters:quantile_sampling
: str, default=‘uniform’,
sampling distribution used to sample the quantiles during training.
Choose from [‘uniform’, ‘beta’]. horizon_weight
: Tensor of size
h, weight for each timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies datapoints to consider
in loss.quantile_loss
:
tensor (single value).*
*DistributionLoss This PyTorch module wraps the
torch.distribution
classes allowing it
to interact with NeuralForecast models modularly. It shares the negative
log-likelihood as the optimization objective and a sample method to
generate empirically the quantiles defined by the level
list.
Additionally, it implements a distribution transformation that
factorizes the scale-dependent likelihood parameters into a base scale
and a multiplier efficiently learnable within the network’s
non-linearities operating ranges.
Available distributions:distribution
: str, identifier of a
torch.distributions.Distribution class.level
: float list
[0,100], confidence levels for prediction intervals.quantiles
:
float list [0,1], alternative to level list, target quantiles.num_samples
: int=500, number of samples for the empirical
quantiles.return_params
: bool=False, wether or not return the
Distribution parameters.horizon_weight
: Tensor of size h, weight
for each timestamp of the forecasting window.*Construct the empirical quantiles from the estimated Distribution, sampling from it
num_samples
independently.
Parametersdistr_args
: Constructor arguments for the
underlying Distribution type.num_samples
: int, overwrite number
of samples for the empirical quantiles.samples
: tensor, shape [B,H,num_samples
].quantiles
: tensor, empirical quantiles defined by levels
.*Computes the negative log-likelihood objective function. To estimate the following predictive distribution: where represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the
mask
tensor.
Parametersy
: tensor, Actual values.distr_args
:
Constructor arguments for the underlying Distribution type.loc
:
Optional tensor, of the same shape as the batch_shape + event_shape of
the resulting distribution.scale
: Optional tensor, of the same
shape as the batch_shape+event_shape of the resulting distribution.mask
: tensor, Specifies date stamps per serie to consider in loss.loss
: scalar, weighted loss function against which
backpropagation will be performed.*Poisson Mixture Mesh This Poisson Mixture statistical model assumes independence across groups of data , and estimates relationships within the group. Parameters:
n_components
: int=10, the number of mixture
components.level
: float list [0,100], confidence levels for
prediction intervals.quantiles
: float list [0,1], alternative
to level list, target quantiles.return_params
: bool=False, wether
or not return the Distribution parameters.batch_correlation
:
bool=False, wether or not model batch correlations.horizon_correlation
: bool=False, wether or not model horizon
correlations.*Construct the empirical quantiles from the estimated Distribution, sampling from it
num_samples
independently.
Parametersdistr_args
: Constructor arguments for the
underlying Distribution type.num_samples
: int, overwrite number
of samples for the empirical quantiles.samples
: tensor, shape [B,H,num_samples
].quantiles
: tensor, empirical quantiles defined by levels
.*Computes the negative log-likelihood objective function. To estimate the following predictive distribution: where represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the
mask
tensor.
Parametersy
: tensor, Actual values.distr_args
:
Constructor arguments for the underlying Distribution type.mask
:
tensor, Specifies date stamps per serie to consider in loss.loss
: scalar, weighted loss function against which
backpropagation will be performed.*Gaussian Mixture Mesh This Gaussian Mixture statistical model assumes independence across groups of data , and estimates relationships within the group. Parameters:
n_components
: int=10, the number of mixture
components.level
: float list [0,100], confidence levels for
prediction intervals.quantiles
: float list [0,1], alternative
to level list, target quantiles.return_params
: bool=False, wether
or not return the Distribution parameters.batch_correlation
:
bool=False, wether or not model batch correlations.horizon_correlation
: bool=False, wether or not model horizon
correlations.*Construct the empirical quantiles from the estimated Distribution, sampling from it
num_samples
independently.
Parametersdistr_args
: Constructor arguments for the
underlying Distribution type.num_samples
: int, overwrite number
of samples for the empirical quantiles.samples
: tensor, shape [B,H,num_samples
].quantiles
: tensor, empirical quantiles defined by levels
.*Computes the negative log-likelihood objective function. To estimate the following predictive distribution: where represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the
mask
tensor.
Parametersy
: tensor, Actual values.distr_args
:
Constructor arguments for the underlying Distribution type.mask
:
tensor, Specifies date stamps per serie to consider in loss.loss
: scalar, weighted loss function against which
backpropagation will be performed.*Negative Binomial Mixture Mesh This N. Binomial Mixture statistical model assumes independence across groups of data , and estimates relationships within the group. Parameters:
n_components
: int=10, the number of mixture
components.level
: float list [0,100], confidence levels for
prediction intervals.quantiles
: float list [0,1], alternative
to level list, target quantiles.return_params
: bool=False, wether
or not return the Distribution parameters.*Construct the empirical quantiles from the estimated Distribution, sampling from it
num_samples
independently.
Parametersdistr_args
: Constructor arguments for the
underlying Distribution type.num_samples
: int, overwrite number
of samples for the empirical quantiles.samples
: tensor, shape [B,H,num_samples
].quantiles
: tensor, empirical quantiles defined by levels
.*Computes the negative log-likelihood objective function. To estimate the following predictive distribution: where represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the
mask
tensor.
Parametersy
: tensor, Actual values.distr_args
:
Constructor arguments for the underlying Distribution type.mask
:
tensor, Specifies date stamps per serie to consider in loss.loss
: scalar, weighted loss function against which
backpropagation will be performed.*Huber Loss The Huber loss, employed in robust regression, is a loss function that exhibits reduced sensitivity to outliers in data when compared to the squared error loss. This function is also refered as SmoothL1. The Huber loss function is quadratic for small errors and linear for large errors, with equal values and slopes of the different sections at the two points where =. where is a threshold parameter that determines the point at which the loss transitions from quadratic to linear, and can be tuned to control the trade-off between robustness and accuracy in the predictions. Parameters:
delta
: float=1.0, Specifies the threshold at which
to change between delta-scaled L1 and L2 loss. horizon_weight
: Tensor
of size h, weight for each timestamp of the forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.huber_loss
: tensor (single value).*
*Tukey Loss The Tukey loss function, also known as Tukey’s biweight function, is a robust statistical loss function used in robust statistics. Tukey’s loss exhibits quadratic behavior near the origin, like the Huber loss; however, it is even more robust to outliers as the loss for large residuals remains constant instead of scaling linearly. The parameter in Tukey’s loss determines the ‘’saturation’’ point of the function: Higher values of enhance sensitivity, while lower values increase resistance to outliers. Please note that the Tukey loss function assumes the data to be stationary or normalized beforehand. If the error values are excessively large, the algorithm may need help to converge during optimization. It is advisable to employ small learning rates. Parameters:
c
: float=4.685, Specifies the Tukey loss’
threshold on which residuals are no longer considered.normalize
:
bool=True, Wether normalization is performed within Tukey loss’
computation.*Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.tukey_loss
: tensor (single value).*
*Huberized Quantile Loss The Huberized quantile loss is a modified version of the quantile loss function that combines the advantages of the quantile loss and the Huber loss. It is commonly used in regression tasks, especially when dealing with data that contains outliers or heavy tails. The Huberized quantile loss between
y
and y_hat
measure the Huber
Loss in a non-symmetric way. The loss pays more attention to
under/over-estimation depending on the quantile parameter ; and
controls the trade-off between robustness and accuracy in the
predictions with the parameter .
Parameters:delta
: float=1.0, Specifies the threshold at which
to change between delta-scaled L1 and L2 loss.q
: float, between 0
and 1. The slope of the quantile loss, in the context of quantile
regression, the q determines the conditional quantile level.horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies datapoints to consider
in loss.huber_qloss
: tensor (single value).*
*Huberized Multi-Quantile loss The Huberized Multi-Quantile loss (HuberMQL) is a modified version of the multi-quantile loss function that combines the advantages of the quantile loss and the Huber loss. HuberMQL is commonly used in regression tasks, especially when dealing with data that contains outliers or heavy tails. The loss function pays more attention to under/over-estimation depending on the quantile list parameter. It controls the trade-off between robustness and prediction accuracy with the parameter . Parameters:
level
: int list [0,100]. Probability levels for
prediction intervals (Defaults median). quantiles
: float list [0.,
1.]. Alternative to level, quantiles to estimate from y distribution.
delta
: float=1.0, Specifies the threshold at which to change between
delta-scaled L1 and L2 loss.horizon_weight
: Tensor of size h, weight for each timestamp of the
forecasting window. *Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.hmqloss
: tensor (single value).*
*Implicit Huber Quantile Loss Computes the huberized quantile loss between
y
and y_hat
, with the
quantile q
provided as an input to the network. HuberIQLoss measures
the deviation of a huberized quantile forecast. By weighting the
absolute deviation in a non symmetric way, the loss pays more attention
to under or over estimation.
Parameters:quantile_sampling
: str, default=‘uniform’,
sampling distribution used to sample the quantiles during training.
Choose from [‘uniform’, ‘beta’]. horizon_weight
: Tensor of size
h, weight for each timestamp of the forecasting window. delta
:
float=1.0, Specifies the threshold at which to change between
delta-scaled L1 and L2 loss.*Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies datapoints to consider
in loss.huber_qloss
: tensor (single value).*
*Accuracy Computes the accuracy between categorical
y
and y_hat
. This
evaluation metric is only meant for evalution, as it is not
differentiable.
*
*Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per serie to
consider in loss.accuracy
: tensor (single value).*
*Scaled Continues Ranked Probability Score Calculates a scaled variation of the CRPS, as proposed by Rangapuram (2021), to measure the accuracy of predicted quantiles
y_hat
compared
to the observation y
.
This metric averages percentual weighted absolute deviations as defined
by the quantile losses.
where is the estimated quantile, and
are the target variable realizations.
Parameters:level
: int list [0,100]. Probability levels for
prediction intervals (Defaults median). quantiles
: float list [0.,
1.]. Alternative to level, quantiles to estimate from y distribution.
References:*Parameters:
y
: tensor, Actual values.y_hat
: tensor,
Predicted values.mask
: tensor, Specifies date stamps per series
to consider in loss.scrps
: tensor (single value).*