Use this file to discover all available pages before exploring further.
The most important train signal is the forecast error, which is the
difference between the observed value yτ and the prediction
y^τ, at time yτ:eτ=yτ−y^ττ∈{t+1,…,t+H}The train loss summarizes the forecast errors in different train
optimization objectives.All the losses are torch.nn.modules which helps to automatically moved
them across CPU/GPU/TPU devices with Pytorch Lightning.
Bases: BasePointLossMean Absolute Error.Calculates Mean Absolute Error between y and y_hat. MAE measures the relative prediction
accuracy of a forecasting method by calculating the deviation of the prediction and the true
value at a given time and averages these devations over the length of the series.MAE(yτ,y^τ)=H1τ=t+1∑t+H∣yτ−y^τ∣Parameters:
Bases: BasePointLossMean Squared Error.Calculates Mean Squared Error between y and y_hat. MSE measures the relative prediction
accuracy of a forecasting method by calculating the squared deviation of the prediction and the true
value at a given time, and averages these devations over the length of the series.MSE(yτ,y^τ)=H1τ=t+1∑t+H(yτ−y^τ)2Parameters:
Bases: BasePointLossRoot Mean Squared Error.Calculates Root Mean Squared Error between y and y_hat. RMSE measures the relative prediction
accuracy of a forecasting method by calculating the squared deviation of the prediction and the observed value at
a given time and averages these devations over the length of the series.
Finally the RMSE will be in the same scale as the original time series so its comparison with other
series is possible only if they share a common scale. RMSE has a direct connection to the L2 norm.RMSE(yτ,y^τ)=H1τ=t+1∑t+H(yτ−y^τ)2Parameters:
Bases: BasePointLossMean Absolute Percentage ErrorCalculates Mean Absolute Percentage Error between
y and y_hat. MAPE measures the relative prediction
accuracy of a forecasting method by calculating the percentual deviation
of the prediction and the observed value at a given time and
averages these devations over the length of the series.
The closer to zero an observed value is, the higher penalty MAPE loss
assigns to the corresponding error.MAPE(yτ,y^τ)=H1τ=t+1∑t+H∣yτ∣∣yτ−y^τ∣Parameters:
Name
Type
Description
Default
horizon_weight
Tensor of size h, weight for each timestamp of the forecasting window.
Bases: BasePointLossSymmetric Mean Absolute Percentage ErrorCalculates Symmetric Mean Absolute Percentage Error between
y and y_hat. SMAPE measures the relative prediction
accuracy of a forecasting method by calculating the relative deviation
of the prediction and the observed value scaled by the sum of the
absolute values for the prediction and observed value at a
given time, then averages these devations over the length
of the series. This allows the SMAPE to have bounds between
0% and 200% which is desireble compared to normal MAPE that
may be undetermined when the target is zero.sMAPE2(yτ,y^τ)=H1τ=t+1∑t+H∣yτ∣+∣y^τ∣∣yτ−y^τ∣Parameters:
Name
Type
Description
Default
horizon_weight
Tensor of size h, weight for each timestamp of the forecasting window.
Bases: BasePointLossMean Absolute Scaled Error
Calculates the Mean Absolute Scaled Error between
y and y_hat. MASE measures the relative prediction
accuracy of a forecasting method by comparinng the mean absolute errors
of the prediction and the observed value against the mean
absolute errors of the seasonal naive model.
The MASE partially composed the Overall Weighted Average (OWA),
used in the M4 Competition.MASE(yτ,y^τ,y^τseason)=H1τ=t+1∑t+HMAE(yτ,y^τseason)∣yτ−y^τ∣Parameters:
Bases: BasePointLossRelative Mean Squared Error
Computes Relative Mean Squared Error (relMSE), as proposed by Hyndman & Koehler (2006)
as an alternative to percentage errors, to avoid measure unstability.relMSE(y,y^,y^benchmark)=MSE(y,y^benchmark)MSE(y,y^)Parameters:
Name
Type
Description
Default
y_train
Numpy array, deprecated.
None
horizon_weight
Tensor of size h, weight for each timestamp of the forecasting window.
These methods use statistical approaches for estimating unknown
probability distributions using observed data.Maximum likelihood estimation involves finding the parameter values that
maximize the likelihood function, which measures the probability of
obtaining the observed data given the parameter values. MLE has good
theoretical properties and efficiency under certain satisfied
assumptions.On the non-parametric approach, quantile regression measures
non-symmetrically deviation, producing under/over estimation.
Bases: BasePointLossQuantile Loss.Computes the quantile loss between y and y_hat.
QL measures the deviation of a quantile forecast.
By weighting the absolute deviation in a non symmetric way, the
loss pays more attention to under or over estimation.
A common value for q is 0.5 for the deviation from the median (Pinball loss).QL(yτ,y^τ(q))=H1τ=t+1∑t+H((1−q)(y^τ(q)−yτ)++q(yτ−y^τ(q))+)Parameters:
Bases: BasePointLossMulti-Quantile lossCalculates the Multi-Quantile loss (MQL) between y and y_hat.
MQL calculates the average multi-quantile Loss for
a given set of quantiles, based on the absolute
difference between predicted quantiles and observed values.MQL(yτ,[y^τ(q1),...,y^τ(qn)])=n1qi∑QL(yτ,y^τ(qi))The limit behavior of MQL allows to measure the accuracy
of a full predictive distribution mathbfhatF_tau with
the continuous ranked probability score (CRPS). This can be achieved
through a numerical integration technique, that discretizes the quantiles
and treats the CRPS integral with a left Riemann approximation, averaging over
uniformly distanced quantiles.CRPS(yτ,F^τ)=∫01QL(yτ,y^τ(q))dqParameters:
Bases: QuantileLossImplicit Quantile Loss.Computes the quantile loss between y and y_hat, with the quantile q provided as an input to the network.
IQL measures the deviation of a quantile forecast.
By weighting the absolute deviation in a non symmetric way, the
loss pays more attention to under or over estimation.QL(yτ,y^τ(q))=H1τ=t+1∑t+H((1−q)(y^τ(q)−yτ)++q(yτ−y^τ(q))+)Parameters:
Bases: ModuleDistributionLossThis PyTorch module wraps the torch.distribution classes allowing it to
interact with NeuralForecast models modularly. It shares the negative
log-likelihood as the optimization objective and a sample method to
generate empirically the quantiles defined by the level list.Additionally, it implements a distribution transformation that factorizes the
scale-dependent likelihood parameters into a base scale and a multiplier
efficiently learnable within the network’s non-linearities operating ranges.Available distributions:
Computes the negative log-likelihood objective function.
To estimate the following predictive distribution:P(yτ∣θ)and−log(P(yτ∣θ))where theta represents the distributions parameters. It aditionally
summarizes the objective signal using a weighted average using the mask tensor.Parameters:
Bases: ModulePoisson Mixture MeshThis Poisson Mixture statistical model assumes independence across groups of
data mathcalG=[g_i], and estimates relationships within the group.P(y[b][t+1:t+H])=[gi]∈G∏P(y[gi][τ])=β∈[gi]∏k=1∑Kwk(β,τ)∈[gi][t+1:t+H]∏Poisson(yβ,τ,λ^β,τ,k)Parameters:
Computes the negative log-likelihood objective function.
To estimate the following predictive distribution:P(yτ∣θ)and−log(P(yτ∣θ))where theta represents the distributions parameters. It aditionally
summarizes the objective signal using a weighted average using the mask tensor.Parameters:
Bases: ModuleGaussian Mixture MeshThis Gaussian Mixture statistical model assumes independence across groups of
data mathcalG=[g_i], and estimates relationships within the group.P(y[b][t+1:t+H])=[gi]∈G∏P(y[gi][τ])=β∈[gi]∏k=1∑Kwk(β,τ)∈[gi][t+1:t+H]∏Gaussian(yβ,τ,μ^β,τ,k,σβ,τ,k)Parameters:
Computes the negative log-likelihood objective function.
To estimate the following predictive distribution:P(yτ∣θ)and−log(P(yτ∣θ))where theta represents the distributions parameters. It aditionally
summarizes the objective signal using a weighted average using the mask tensor.Parameters:
Bases: ModuleNegative Binomial Mixture MeshThis N. Binomial Mixture statistical model assumes independence across groups of
data mathcalG=[g_i], and estimates relationships within the group.P(y[b][t+1:t+H])=[gi]∈G∏P(y[gi][τ])=β∈[gi]∏k=1∑Kwk(β,τ)∈[gi][t+1:t+H]∏NBinomial(yβ,τ,r^β,τ,k,p^β,τ,k)Parameters:
Computes the negative log-likelihood objective function.
To estimate the following predictive distribution:P(yτ∣θ)and−log(P(yτ∣θ))where theta represents the distributions parameters. It aditionally
summarizes the objective signal using a weighted average using the mask tensor.Parameters:
Bases: BasePointLossHuber LossThe Huber loss, employed in robust regression, is a loss function that
exhibits reduced sensitivity to outliers in data when compared to the
squared error loss. This function is also refered as SmoothL1.The Huber loss function is quadratic for small errors and linear for large
errors, with equal values and slopes of the different sections at the two
points where (y_tau−hatytau)2=∣ytau−haty_tau∣.Lδ(yτ,y^τ)={21(yτ−y^τ)2for ∣yτ−y^τ∣≤δδ⋅(∣yτ−y^τ∣−21δ),otherwise.where delta is a threshold parameter that determines the point at which the loss transitions from quadratic to linear,
and can be tuned to control the trade-off between robustness and accuracy in the predictions.Parameters:
Bases: BasePointLossTukey LossThe Tukey loss function, also known as Tukey’s biweight function, is a
robust statistical loss function used in robust statistics. Tukey’s loss exhibits
quadratic behavior near the origin, like the Huber loss; however, it is even more
robust to outliers as the loss for large residuals remains constant instead of
scaling linearly.The parameter c in Tukey’s loss determines the ”saturation” point
of the function: Higher values of c enhance sensitivity, while lower values
increase resistance to outliers.Lc(yτ,y^τ)=⎩⎨⎧6c2[1−(cyτ−y^τ)2]3for ∣yτ−y^τ∣≤c6c2otherwise.Please note that the Tukey loss function assumes the data to be stationary or
normalized beforehand. If the error values are excessively large, the algorithm
may need help to converge during optimization. It is advisable to employ small learning rates.Parameters:
Bases: BasePointLossHuberized Quantile LossThe Huberized quantile loss is a modified version of the quantile loss function that
combines the advantages of the quantile loss and the Huber loss. It is commonly used
in regression tasks, especially when dealing with data that contains outliers or heavy tails.The Huberized quantile loss between y and y_hat measure the Huber Loss in a non-symmetric way.
The loss pays more attention to under/over-estimation depending on the quantile parameter q;
and controls the trade-off between robustness and accuracy in the predictions with the parameter delta.HuberQL(yτ,y^τ(q))=(1−q)Lδ(yτ,y^τ(q))1{y^τ(q)≥yτ}+qLδ(yτ,y^τ(q))1{y^τ(q)<yτ}Parameters:
Bases: BasePointLossHuberized Multi-Quantile lossThe Huberized Multi-Quantile loss (HuberMQL) is a modified version of the multi-quantile loss function
that combines the advantages of the quantile loss and the Huber loss. HuberMQL is commonly used in regression
tasks, especially when dealing with data that contains outliers or heavy tails. The loss function pays
more attention to under/over-estimation depending on the quantile list [q_1,q_2,dots] parameter.
It controls the trade-off between robustness and prediction accuracy with the parameter delta.HuberMQLδ(yτ,[y^τ(q1),...,y^τ(qn)])=n1qi∑HuberQLδ(yτ,y^τ(qi))Parameters:
Name
Type
Description
Default
level
int list
Probability levels for prediction intervals (Defaults median). Defaults to [80, 90].
[80, 90]
quantiles
float list
Alternative to level, quantiles to estimate from y distribution. Defaults to None.
Bases: HuberQLossImplicit Huber Quantile LossComputes the huberized quantile loss between y and y_hat, with the quantile q provided as an input to the network.
HuberIQLoss measures the deviation of a huberized quantile forecast.
By weighting the absolute deviation in a non symmetric way, the
loss pays more attention to under or over estimation.HuberIQL(yτ,y^τ(q))=(1−q)Lδ(yτ,y^τ(q))1{y^τ(q)≥yτ}+qLδ(yτ,y^τ(q))1{y^τ(q)<yτ}Parameters:
Bases: BasePointLossAccuracyComputes the accuracy between categorical y and y_hat.
This evaluation metric is only meant for evalution, as it
is not differentiable.Accuracy(yτ,y^τ)=H1τ=t+1∑t+H1{yτ==y^τ}
Bases: BasePointLossScaled Continues Ranked Probability ScoreCalculates a scaled variation of the CRPS, as proposed by Rangapuram (2021),
to measure the accuracy of predicted quantiles y_hat compared to the observation y.This metric averages percentual weighted absolute deviations as
defined by the quantile losses.sCRPS(y^τ(q),yτ)=N2i∑∫01∑i∣yi,τ∣QL(y^τ(qyi,τ)qdqwhere mathbfhatytau(q is the estimated quantile, and yi,tau
are the target variable realizations.Parameters:
Name
Type
Description
Default
level
int list
Probability levels for prediction intervals (Defaults median). Defaults to [80, 90].
[80, 90]
quantiles
float list
Alternative to level, quantiles to estimate from y distribution. Defaults to None.