Skip to main content
The most important train signal is the forecast error, which is the difference between the observed value yτy_{\tau} and the prediction y^τ\hat{y}_{\tau}, at time yτy_{\tau}: eτ=yτy^ττ{t+1,,t+H}e_{\tau} = y_{\tau}-\hat{y}_{\tau} \qquad \qquad \tau \in \{t+1,\dots,t+H \} The train loss summarizes the forecast errors in different train optimization objectives. All the losses are torch.nn.modules which helps to automatically moved them across CPU/GPU/TPU devices with Pytorch Lightning.

BasePointLoss

BasePointLoss(
    horizon_weight=None, outputsize_multiplier=None, output_names=None
)
Bases: Module Base class for point loss functions. Parameters:
NameTypeDescriptionDefault
horizon_weightOptional[Tensor]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None
outputsize_multiplierOptional[int]Multiplier for the output size. Defaults to None.None
output_namesOptional[List[str]]Names of the outputs. Defaults to None.None

1. Scale-dependent Errors

These metrics are on the same scale as the data.

Mean Absolute Error (MAE)

MAE

MAE(horizon_weight=None)
Bases: BasePointLoss Mean Absolute Error. Calculates Mean Absolute Error between y and y_hat. MAE measures the relative prediction accuracy of a forecasting method by calculating the deviation of the prediction and the true value at a given time and averages these devations over the length of the series. MAE(yτ,y^τ)=1Hτ=t+1t+Hyτy^τ\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} |y_{\tau} - \hat{y}_{\tau}| Parameters:
NameTypeDescriptionDefault
horizon_weightOptional[Tensor]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

MAE.__call__

__call__(y, y_hat, mask=None, y_insample=None)
Calculate Mean Absolute Error between actual and predicted values. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies datapoints to consider in loss. Defaults to None.None
y_insampleUnion[Tensor, None]Actual insample values. Defaults to None.None
Returns:
TypeDescription
Tensortorch.Tensor: MAE (single value).

Mean Squared Error (MSE)

MSE

MSE(horizon_weight=None)
Bases: BasePointLoss Mean Squared Error. Calculates Mean Squared Error between y and y_hat. MSE measures the relative prediction accuracy of a forecasting method by calculating the squared deviation of the prediction and the true value at a given time, and averages these devations over the length of the series. MSE(yτ,y^τ)=1Hτ=t+1t+H(yτy^τ)2\mathrm{MSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2} Parameters:
NameTypeDescriptionDefault
horizon_weightOptional[Tensor]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

MSE.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Calculate Mean Squared Error between actual and predicted values. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
y_insampleUnion[Tensor, None]Actual insample values. Defaults to None.None
maskUnion[Tensor, None]Specifies datapoints to consider in loss. Defaults to None.None
Returns:
TypeDescription
Tensortorch.Tensor: MSE (single value).

Root Mean Squared Error (RMSE)

RMSE

RMSE(horizon_weight=None)
Bases: BasePointLoss Root Mean Squared Error. Calculates Root Mean Squared Error between y and y_hat. RMSE measures the relative prediction accuracy of a forecasting method by calculating the squared deviation of the prediction and the observed value at a given time and averages these devations over the length of the series. Finally the RMSE will be in the same scale as the original time series so its comparison with other series is possible only if they share a common scale. RMSE has a direct connection to the L2 norm. RMSE(yτ,y^τ)=1Hτ=t+1t+H(yτy^τ)2\mathrm{RMSE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \sqrt{\frac{1}{H} \sum^{t+H}_{\tau=t+1} (y_{\tau} - \hat{y}_{\tau})^{2}} Parameters:
NameTypeDescriptionDefault
horizon_weightOptional[Tensor]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

RMSE.__call__

__call__(y, y_hat, mask=None, y_insample=None)
Parameters:
NameTypeDescriptionDefault
yTensorTensor, Actual values.required
y_hatTensorTensor, Predicted values.required
maskUnion[Tensor, None]Tensor, Specifies datapoints to consider in loss.None
Returns:
NameTypeDescription
rmseTensorTensor (single value).

2. Percentage errors

These metrics are unit-free, suitable for comparisons across series.

Mean Absolute Percentage Error (MAPE)

MAPE

MAPE(horizon_weight=None)
Bases: BasePointLoss Mean Absolute Percentage Error Calculates Mean Absolute Percentage Error between y and y_hat. MAPE measures the relative prediction accuracy of a forecasting method by calculating the percentual deviation of the prediction and the observed value at a given time and averages these devations over the length of the series. The closer to zero an observed value is, the higher penalty MAPE loss assigns to the corresponding error. MAPE(yτ,y^τ)=1Hτ=t+1t+Hyτy^τyτ\mathrm{MAPE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|} Parameters:
NameTypeDescriptionDefault
horizon_weightTensor of size h, weight for each timestamp of the forecasting window.None

MAPE.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorTensor, Actual values.required
y_hatTensorTensor, Predicted values.required
maskUnion[Tensor, None]Tensor, Specifies date stamps per serie to consider in loss.None
Returns:
NameTypeDescription
mapeTensorTensor (single value).

Symmetric MAPE (sMAPE)

SMAPE

SMAPE(horizon_weight=None)
Bases: BasePointLoss Symmetric Mean Absolute Percentage Error Calculates Symmetric Mean Absolute Percentage Error between y and y_hat. SMAPE measures the relative prediction accuracy of a forecasting method by calculating the relative deviation of the prediction and the observed value scaled by the sum of the absolute values for the prediction and observed value at a given time, then averages these devations over the length of the series. This allows the SMAPE to have bounds between 0% and 200% which is desireble compared to normal MAPE that may be undetermined when the target is zero. sMAPE2(yτ,y^τ)=1Hτ=t+1t+Hyτy^τyτ+y^τ\mathrm{sMAPE}_{2}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{|y_{\tau}|+|\hat{y}_{\tau}|} Parameters:
NameTypeDescriptionDefault
horizon_weightTensor of size h, weight for each timestamp of the forecasting window.None

SMAPE.__call__

__call__(y, y_hat, mask=None, y_insample=None)
Parameters:
NameTypeDescriptionDefault
yTensorTensor, Actual values.required
y_hatTensorTensor, Predicted values.required
maskUnion[Tensor, None]Tensor, Specifies date stamps per serie to consider in loss.None
Returns:
NameTypeDescription
smapeTensorTensor (single value).

3. Scale-independent Errors

These metrics measure the relative improvements versus baselines.

Mean Absolute Scaled Error (MASE)

MASE

MASE(seasonality, horizon_weight=None)
Bases: BasePointLoss Mean Absolute Scaled Error Calculates the Mean Absolute Scaled Error between y and y_hat. MASE measures the relative prediction accuracy of a forecasting method by comparinng the mean absolute errors of the prediction and the observed value against the mean absolute errors of the seasonal naive model. The MASE partially composed the Overall Weighted Average (OWA), used in the M4 Competition. MASE(yτ,y^τ,y^τseason)=1Hτ=t+1t+Hyτy^τMAE(yτ,y^τseason)\mathrm{MASE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \frac{|y_{\tau}-\hat{y}_{\tau}|}{\mathrm{MAE}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{season}_{\tau})} Parameters:
NameTypeDescriptionDefault
seasonalityintInt. Main frequency of the time series; Hourly 24, Daily 7, Weekly 52, Monthly 12, Quarterly 4, Yearly 1.required
horizon_weightTensor of size h, weight for each timestamp of the forecasting window.None

MASE.__call__

__call__(y, y_hat, y_insample, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorTensor (batch_size, output_size), Actual values.required
y_hatTensorTensor (batch_size, output_size)), Predicted values.required
y_insampleTensorTensor (batch_size, input_size), Actual insample values.required
maskUnion[Tensor, None]Tensor, Specifies date stamps per serie to consider in loss.None
Returns:
NameTypeDescription
maseTensorTensor (single value).

Relative Mean Squared Error (relMSE)

relMSE

relMSE(y_train=None, horizon_weight=None)
Bases: BasePointLoss Relative Mean Squared Error Computes Relative Mean Squared Error (relMSE), as proposed by Hyndman & Koehler (2006) as an alternative to percentage errors, to avoid measure unstability. relMSE(y,y^,y^benchmark)=MSE(y,y^)MSE(y,y^benchmark)\mathrm{relMSE}(\mathbf{y}, \mathbf{\hat{y}}, \mathbf{\hat{y}}^{benchmark}) = \frac{\mathrm{MSE}(\mathbf{y}, \mathbf{\hat{y}})}{\mathrm{MSE}(\mathbf{y}, \mathbf{\hat{y}}^{benchmark})} Parameters:
NameTypeDescriptionDefault
y_trainNumpy array, deprecated.None
horizon_weightTensor of size h, weight for each timestamp of the forecasting window.None

relMSE.__call__

__call__(y, y_hat, y_benchmark, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorTensor (batch_size, output_size), Actual values.required
y_hatTensorTensor (batch_size, output_size)), Predicted values.required
y_benchmarkTensorTensor (batch_size, output_size), Benchmark predicted values.required
maskUnion[Tensor, None]Tensor, Specifies date stamps per serie to consider in loss.None
Returns:
NameTypeDescription
relMSETensorTensor (single value).

4. Probabilistic Errors

These methods use statistical approaches for estimating unknown probability distributions using observed data. Maximum likelihood estimation involves finding the parameter values that maximize the likelihood function, which measures the probability of obtaining the observed data given the parameter values. MLE has good theoretical properties and efficiency under certain satisfied assumptions. On the non-parametric approach, quantile regression measures non-symmetrically deviation, producing under/over estimation.

Quantile Loss

QuantileLoss

QuantileLoss(q, horizon_weight=None)
Bases: BasePointLoss Quantile Loss. Computes the quantile loss between y and y_hat. QL measures the deviation of a quantile forecast. By weighting the absolute deviation in a non symmetric way, the loss pays more attention to under or over estimation. A common value for q is 0.5 for the deviation from the median (Pinball loss). QL(yτ,y^τ(q))=1Hτ=t+1t+H((1q)(y^τ(q)yτ)++q(yτy^τ(q))+)\mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q)}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \Big( (1-q)\,( \hat{y}^{(q)}_{\tau} - y_{\tau} )_{+} + q\,( y_{\tau} - \hat{y}^{(q)}_{\tau} )_{+} \Big) Parameters:
NameTypeDescriptionDefault
qfloatBetween 0 and 1. The slope of the quantile loss, in the context of quantile regression, the q determines the conditional quantile level.required
horizon_weightOptional[Tensor]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

QuantileLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Calculate quantile loss between actual and predicted values. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
y_insampleUnion[Tensor, None]Actual insample values. Defaults to None.None
maskUnion[Tensor, None]Specifies datapoints to consider in loss. Defaults to None.None
Returns:
TypeDescription
Tensortorch.Tensor: Quantile loss (single value).

Multi Quantile Loss (MQLoss)

MQLoss

MQLoss(level=[80, 90], quantiles=None, horizon_weight=None)
Bases: BasePointLoss Multi-Quantile loss Calculates the Multi-Quantile loss (MQL) between y and y_hat. MQL calculates the average multi-quantile Loss for a given set of quantiles, based on the absolute difference between predicted quantiles and observed values. MQL(yτ,[y^τ(q1),...,y^τ(qn)])=1nqiQL(yτ,y^τ(qi))\mathrm{MQL}(\mathbf{y}_{\tau},[\mathbf{\hat{y}}^{(q_{1})}_{\tau}, ... ,\hat{y}^{(q_{n})}_{\tau}]) = \frac{1}{n} \sum_{q_{i}} \mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q_{i})}_{\tau}) The limit behavior of MQL allows to measure the accuracy of a full predictive distribution mathbfhatF_tau\\mathbf{\\hat{F}}\_{\\tau} with the continuous ranked probability score (CRPS). This can be achieved through a numerical integration technique, that discretizes the quantiles and treats the CRPS integral with a left Riemann approximation, averaging over uniformly distanced quantiles. CRPS(yτ,F^τ)=01QL(yτ,y^τ(q))dq\mathrm{CRPS}(y_{\tau}, \mathbf{\hat{F}}_{\tau}) = \int^{1}_{0} \mathrm{QL}(y_{\tau}, \hat{y}^{(q)}_{\tau}) dq Parameters:
NameTypeDescriptionDefault
levelList[int]Probability levels for prediction intervals. Defaults to [80, 90].[80, 90]
quantilesOptional[List[float]]Alternative to level, quantiles to estimate from y distribution. Defaults to None.None
horizon_weightOptional[Tensor]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

MQLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Computes the multi-quantile loss. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
y_insampleUnion[Tensor, None]In-sample values. Defaults to None.None
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
TypeDescription
Tensortorch.Tensor: Multi-quantile loss (single value).

Implicit Quantile Loss (IQLoss)

QuantileLayer

QuantileLayer(num_output, cos_embedding_dim=128)
Bases: Module Implicit Quantile Layer from the paper IQN for Distributional Reinforcement Learning. Code from GluonTS: https://github.com/awslabs/gluonts/blob/61133ef6e2d88177b32ace4afc6843ab9a7bc8cd/src/gluonts/torch/distributions/implicit_quantile_network.py

IQLoss

IQLoss(
    cos_embedding_dim=64,
    concentration0=1.0,
    concentration1=1.0,
    horizon_weight=None,
)
Bases: QuantileLoss Implicit Quantile Loss. Computes the quantile loss between y and y_hat, with the quantile q provided as an input to the network. IQL measures the deviation of a quantile forecast. By weighting the absolute deviation in a non symmetric way, the loss pays more attention to under or over estimation. QL(yτ,y^τ(q))=1Hτ=t+1t+H((1q)(y^τ(q)yτ)++q(yτy^τ(q))+)\mathrm{QL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q)}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \Big( (1-q)\,( \hat{y}^{(q)}_{\tau} - y_{\tau} )_{+} + q\,( y_{\tau} - \hat{y}^{(q)}_{\tau} )_{+} \Big) Parameters:
NameTypeDescriptionDefault
cos_embedding_dimintCosine embedding dimension. Defaults to 64.64
concentration0floatBeta distribution concentration parameter. Defaults to 1.0.1.0
concentration1floatBeta distribution concentration parameter. Defaults to 1.0.1.0
horizon_weightOptional[Tensor]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

IQLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Calculate quantile loss between actual and predicted values. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
y_insampleUnion[Tensor, None]Actual insample values. Defaults to None.None
maskUnion[Tensor, None]Specifies datapoints to consider in loss. Defaults to None.None
Returns:
TypeDescription
Tensortorch.Tensor: Quantile loss (single value).

DistributionLoss

DistributionLoss

DistributionLoss(
    distribution,
    level=[80, 90],
    quantiles=None,
    num_samples=1000,
    return_params=False,
    horizon_weight=None,
    **distribution_kwargs
)
Bases: Module DistributionLoss This PyTorch module wraps the torch.distribution classes allowing it to interact with NeuralForecast models modularly. It shares the negative log-likelihood as the optimization objective and a sample method to generate empirically the quantiles defined by the level list. Additionally, it implements a distribution transformation that factorizes the scale-dependent likelihood parameters into a base scale and a multiplier efficiently learnable within the network’s non-linearities operating ranges. Available distributions:
  • Poisson
  • Normal
  • StudentT
  • NegativeBinomial
  • Tweedie
  • Bernoulli (Temporal Classifiers)
  • ISQF (Incremental Spline Quantile Function)
Parameters:
NameTypeDescriptionDefault
distributionstrIdentifier of a torch.distributions.Distribution class.required
levelfloat listConfidence levels for prediction intervals.[80, 90]
quantilesfloat listAlternative to level list, target quantiles.None
num_samplesintNumber of samples for the empirical quantiles.1000
return_paramsboolWhether or not return the Distribution parameters.False
horizon_weightTensorTensor of size h, weight for each timestamp of the forecasting window.None
Returns:
NameTypeDescription
tupleTuple with tensors of ISQF distribution arguments.

DistributionLoss.__call__

__call__(y, distr_args, mask=None)
Computes the negative log-likelihood objective function. To estimate the following predictive distribution: P(yτθ)andlog(P(yτθ))\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta) \quad \mathrm{and} \quad -\log(\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta)) where theta\\theta represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the mask tensor. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
distr_argsTensorConstructor arguments for the underlying Distribution type.required
locOptional[Tensor]Optional tensor, of the same shape as the batch_shape + event_shape. Defaults to None. of the resulting distribution.required
scaleOptional[Tensor]Optional tensor, of the same shape as the batch_shape+event_shape of the resulting distribution. Defaults to None.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatWeighted loss function against which backpropagation will be performed.

Poisson Mixture Mesh (PMM)

PMM

PMM(
    n_components=10,
    level=[80, 90],
    quantiles=None,
    num_samples=1000,
    return_params=False,
    batch_correlation=False,
    horizon_correlation=False,
    weighted=False,
)
Bases: Module Poisson Mixture Mesh This Poisson Mixture statistical model assumes independence across groups of data mathcalG=[g_i]\\mathcal{G}={[g\_{i}]}, and estimates relationships within the group. P(y[b][t+1:t+H])=[gi]GP(y[gi][τ])=β[gi](k=1Kwk(β,τ)[gi][t+1:t+H]Poisson(yβ,τ,λ^β,τ,k))\mathrm{P}\left(\mathbf{y}_{[b][t+1:t+H]}\right) = \prod_{ [g_{i}] \in \mathcal{G}} \mathrm{P} \left(\mathbf{y}_{[g_{i}][\tau]} \right) = \prod_{\beta\in[g_{i}]} \left(\sum_{k=1}^{K} w_k \prod_{(\beta,\tau) \in [g_i][t+1:t+H]} \mathrm{Poisson}(y_{\beta,\tau}, \hat{\lambda}_{\beta,\tau,k}) \right) Parameters:
NameTypeDescriptionDefault
n_componentsintThe number of mixture components. Defaults to 10.10
levelfloat listConfidence levels for prediction intervals. Defaults to [80, 90].[80, 90]
quantilesfloat listAlternative to level list, target quantiles. Defaults to None.None
return_paramsboolWhether or not return the Distribution parameters. Defaults to False.False
batch_correlationboolWhether or not model batch correlations. Defaults to False.False
horizon_correlationboolWhether or not model horizon correlations. Defaults to False.False

PMM.__call__

__call__(y, distr_args, mask=None)
Computes the negative log-likelihood objective function. To estimate the following predictive distribution: P(yτθ)andlog(P(yτθ))\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta) \quad \mathrm{and} \quad -\log(\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta)) where theta\\theta represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the mask tensor. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
distr_argsTensorConstructor arguments for the underlying Distribution type.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatWeighted loss function against which backpropagation will be performed.

Gaussian Mixture Mesh (GMM)

GMM

GMM(
    n_components=1,
    level=[80, 90],
    quantiles=None,
    num_samples=1000,
    return_params=False,
    batch_correlation=False,
    horizon_correlation=False,
    weighted=False,
)
Bases: Module Gaussian Mixture Mesh This Gaussian Mixture statistical model assumes independence across groups of data mathcalG=[g_i]\\mathcal{G}={[g\_{i}]}, and estimates relationships within the group. P(y[b][t+1:t+H])=[gi]GP(y[gi][τ])=β[gi](k=1Kwk(β,τ)[gi][t+1:t+H]Gaussian(yβ,τ,μ^β,τ,k,σβ,τ,k))\mathrm{P}\left(\mathbf{y}_{[b][t+1:t+H]}\right) = \prod_{ [g_{i}] \in \mathcal{G}} \mathrm{P}\left(\mathbf{y}_{[g_{i}][\tau]}\right)= \prod_{\beta\in[g_{i}]} \left(\sum_{k=1}^{K} w_k \prod_{(\beta,\tau) \in [g_i][t+1:t+H]} \mathrm{Gaussian}(y_{\beta,\tau}, \hat{\mu}_{\beta,\tau,k}, \sigma_{\beta,\tau,k})\right) Parameters:
NameTypeDescriptionDefault
n_componentsintThe number of mixture components. Defaults to 10.1
levelfloat listConfidence levels for prediction intervals. Defaults to [80, 90].[80, 90]
quantilesfloat listAlternative to level list, target quantiles. Defaults to None.None
return_paramsboolWhether or not return the Distribution parameters. Defaults to False.False
batch_correlationboolWhether or not model batch correlations. Defaults to False.False
horizon_correlationboolWhether or not model horizon correlations. Defaults to False.False
weightedboolWhether or not model weighted components. Defaults to False.False
num_samplesintNumber of samples for the empirical quantiles. Defaults to 1000.1000

GMM.__call__

__call__(y, distr_args, mask=None)
Computes the negative log-likelihood objective function. To estimate the following predictive distribution: P(yτθ)andlog(P(yτθ))\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta) \quad \mathrm{and} \quad -\log(\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta)) where theta\\theta represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the mask tensor. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
distr_argsTensorConstructor arguments for the underlying Distribution type.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatWeighted loss function against which backpropagation will be performed.

Negative Binomial Mixture Mesh (NBMM)

NBMM

NBMM(
    n_components=1,
    level=[80, 90],
    quantiles=None,
    num_samples=1000,
    return_params=False,
    weighted=False,
)
Bases: Module Negative Binomial Mixture Mesh This N. Binomial Mixture statistical model assumes independence across groups of data mathcalG=[g_i]\\mathcal{G}={[g\_{i}]}, and estimates relationships within the group. P(y[b][t+1:t+H])=[gi]GP(y[gi][τ])=β[gi](k=1Kwk(β,τ)[gi][t+1:t+H]NBinomial(yβ,τ,r^β,τ,k,p^β,τ,k))\mathrm{P}\left(\mathbf{y}_{[b][t+1:t+H]}\right) = \prod_{ [g_{i}] \in \mathcal{G}} \mathrm{P}\left(\mathbf{y}_{[g_{i}][\tau]}\right)= \prod_{\beta\in[g_{i}]} \left(\sum_{k=1}^{K} w_k \prod_{(\beta,\tau) \in [g_i][t+1:t+H]} \mathrm{NBinomial}(y_{\beta,\tau}, \hat{r}_{\beta,\tau,k}, \hat{p}_{\beta,\tau,k})\right) Parameters:
NameTypeDescriptionDefault
n_componentsintThe number of mixture components. Defaults to 10.1
levelfloat listConfidence levels for prediction intervals. Defaults to [80, 90].[80, 90]
quantilesfloat listAlternative to level list, target quantiles. Defaults to None.None
return_paramsboolWhether or not return the Distribution parameters. Defaults to False.False
weightedboolWhether or not model weighted components. Defaults to False.False
num_samplesintNumber of samples for the empirical quantiles. Defaults to 1000.1000

NBMM.__call__

__call__(y, distr_args, mask=None)
Computes the negative log-likelihood objective function. To estimate the following predictive distribution: P(yτθ)andlog(P(yτθ))\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta) \quad \mathrm{and} \quad -\log(\mathrm{P}(\mathbf{y}_{\tau}\,|\,\theta)) where theta\\theta represents the distributions parameters. It aditionally summarizes the objective signal using a weighted average using the mask tensor. Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
distr_argsTensorConstructor arguments for the underlying Distribution type.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatWeighted loss function against which backpropagation will be performed.

5. Robustified Errors

Huber Loss

HuberLoss

HuberLoss(delta=1.0, horizon_weight=None)
Bases: BasePointLoss Huber Loss The Huber loss, employed in robust regression, is a loss function that exhibits reduced sensitivity to outliers in data when compared to the squared error loss. This function is also refered as SmoothL1. The Huber loss function is quadratic for small errors and linear for large errors, with equal values and slopes of the different sections at the two points where (y_tauhatytau)2(y\_{\\tau}-\\hat{y}_{\\tau})^{2}=ytauhaty_tau|y_{\\tau}-\\hat{y}\_{\\tau}|. Lδ(yτ,  y^τ)={12(yτy^τ)2  for yτy^τδδ (yτy^τ12δ),  otherwise.L_{\delta}(y_{\tau},\; \hat{y}_{\tau}) =\begin{cases}{\frac{1}{2}}(y_{\tau}-\hat{y}_{\tau})^{2}\;{\text{for }}|y_{\tau}-\hat{y}_{\tau}|\leq \delta \\ \delta \ \cdot \left(|y_{\tau}-\hat{y}_{\tau}|-{\frac {1}{2}}\delta \right),\;{\text{otherwise.}}\end{cases} where delta\\delta is a threshold parameter that determines the point at which the loss transitions from quadratic to linear, and can be tuned to control the trade-off between robustness and accuracy in the predictions. Parameters:
NameTypeDescriptionDefault
deltafloatSpecifies the threshold at which to change between delta-scaled L1 and L2 loss. Defaults to 1.0.1.0
horizon_weightUnion[Tensor, None]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

HuberLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatTensorHuber loss.

Tukey Loss

TukeyLoss

TukeyLoss(c=4.685, normalize=True)
Bases: BasePointLoss Tukey Loss The Tukey loss function, also known as Tukey’s biweight function, is a robust statistical loss function used in robust statistics. Tukey’s loss exhibits quadratic behavior near the origin, like the Huber loss; however, it is even more robust to outliers as the loss for large residuals remains constant instead of scaling linearly. The parameter cc in Tukey’s loss determines the ”saturation” point of the function: Higher values of cc enhance sensitivity, while lower values increase resistance to outliers. Lc(yτ,  y^τ)={c26[1(yτy^τc)2]3  for yτy^τcc26otherwise.L_{c}(y_{\tau},\; \hat{y}_{\tau}) =\begin{cases}{ \frac{c^{2}}{6}} \left[1-(\frac{y_{\tau}-\hat{y}_{\tau}}{c})^{2} \right]^{3} \;\text{for } |y_{\tau}-\hat{y}_{\tau}|\leq c \\ \frac{c^{2}}{6} \qquad \text{otherwise.} \end{cases} Please note that the Tukey loss function assumes the data to be stationary or normalized beforehand. If the error values are excessively large, the algorithm may need help to converge during optimization. It is advisable to employ small learning rates. Parameters:
NameTypeDescriptionDefault
cfloatSpecifies the Tukey loss’ threshold on which residuals are no longer considered. Defaults to 4.685.4.685
normalizeboolWether normalization is performed within Tukey loss’ computation. Defaults to True.True

TukeyLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatTensorTukey loss.

Huberized Quantile Loss

HuberQLoss

HuberQLoss(q, delta=1.0, horizon_weight=None)
Bases: BasePointLoss Huberized Quantile Loss The Huberized quantile loss is a modified version of the quantile loss function that combines the advantages of the quantile loss and the Huber loss. It is commonly used in regression tasks, especially when dealing with data that contains outliers or heavy tails. The Huberized quantile loss between y and y_hat measure the Huber Loss in a non-symmetric way. The loss pays more attention to under/over-estimation depending on the quantile parameter qq; and controls the trade-off between robustness and accuracy in the predictions with the parameter deltadelta. HuberQL(yτ,y^τ(q))=(1q)Lδ(yτ,  y^τ(q))1{y^τ(q)yτ}+qLδ(yτ,  y^τ(q))1{y^τ(q)<yτ}\mathrm{HuberQL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q)}_{\tau}) = (1-q)\, L_{\delta}(y_{\tau},\; \hat{y}^{(q)}_{\tau}) \mathbb{1}\{ \hat{y}^{(q)}_{\tau} \geq y_{\tau} \} + q\, L_{\delta}(y_{\tau},\; \hat{y}^{(q)}_{\tau}) \mathbb{1}\{ \hat{y}^{(q)}_{\tau} < y_{\tau} \} Parameters:
NameTypeDescriptionDefault
deltafloatSpecifies the threshold at which to change between delta-scaled L1 and L2 loss. Defaults to 1.0.1.0
qfloatThe slope of the quantile loss, in the context of quantile regression, the q determines the conditional quantile level. Defaults to 0.5.required
horizon_weightUnion[Tensor, None]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

HuberQLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatTensorHuberQLoss.

Huberized MQLoss

HuberMQLoss

HuberMQLoss(level=[80, 90], quantiles=None, delta=1.0, horizon_weight=None)
Bases: BasePointLoss Huberized Multi-Quantile loss The Huberized Multi-Quantile loss (HuberMQL) is a modified version of the multi-quantile loss function that combines the advantages of the quantile loss and the Huber loss. HuberMQL is commonly used in regression tasks, especially when dealing with data that contains outliers or heavy tails. The loss function pays more attention to under/over-estimation depending on the quantile list [q_1,q_2,dots][q\_{1},q\_{2},\\dots] parameter. It controls the trade-off between robustness and prediction accuracy with the parameter delta\\delta. HuberMQLδ(yτ,[y^τ(q1),...,y^τ(qn)])=1nqiHuberQLδ(yτ,y^τ(qi))\mathrm{HuberMQL}_{\delta}(\mathbf{y}_{\tau},[\mathbf{\hat{y}}^{(q_{1})}_{\tau}, ... ,\hat{y}^{(q_{n})}_{\tau}]) = \frac{1}{n} \sum_{q_{i}} \mathrm{HuberQL}_{\delta}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q_{i})}_{\tau}) Parameters:
NameTypeDescriptionDefault
levelint listProbability levels for prediction intervals (Defaults median). Defaults to [80, 90].[80, 90]
quantilesfloat listAlternative to level, quantiles to estimate from y distribution. Defaults to None.None
deltafloatSpecifies the threshold at which to change between delta-scaled L1 and L2 loss. Defaults to 1.0.1.0
horizon_weightUnion[Tensor, None]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None

HuberMQLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatTensorHuberMQLoss.

Huberized IQLoss

HuberIQLoss

HuberIQLoss(
    cos_embedding_dim=64,
    concentration0=1.0,
    concentration1=1.0,
    delta=1.0,
    horizon_weight=None,
)
Bases: HuberQLoss Implicit Huber Quantile Loss Computes the huberized quantile loss between y and y_hat, with the quantile q provided as an input to the network. HuberIQLoss measures the deviation of a huberized quantile forecast. By weighting the absolute deviation in a non symmetric way, the loss pays more attention to under or over estimation. HuberIQL(yτ,y^τ(q))=(1q)Lδ(yτ,  y^τ(q))1{y^τ(q)yτ}+qLδ(yτ,  y^τ(q))1{y^τ(q)<yτ}\mathrm{HuberIQL}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}^{(q)}_{\tau}) = (1-q)\, L_{\delta}(y_{\tau},\; \hat{y}^{(q)}_{\tau}) \mathbb{1}\{ \hat{y}^{(q)}_{\tau} \geq y_{\tau} \} + q\, L_{\delta}(y_{\tau},\; \hat{y}^{(q)}_{\tau}) \mathbb{1}\{ \hat{y}^{(q)}_{\tau} < y_{\tau} \} Parameters:
NameTypeDescriptionDefault
quantile_samplingstrSampling distribution used to sample the quantiles during training. Choose from [‘uniform’, ‘beta’]. Defaults to ‘uniform’.required
horizon_weightUnion[Tensor, None]Tensor of size h, weight for each timestamp of the forecasting window. Defaults to None.None
deltafloatSpecifies the threshold at which to change between delta-scaled L1 and L2 loss. Defaults to 1.0.1.0

HuberIQLoss.__call__

__call__(y, y_hat, y_insample=None, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatTensorHuberQLoss.

6. Others

Accuracy

Accuracy

Accuracy()
Bases: BasePointLoss Accuracy Computes the accuracy between categorical y and y_hat. This evaluation metric is only meant for evalution, as it is not differentiable. Accuracy(yτ,y^τ)=1Hτ=t+1t+H1{yτ==y^τ}\mathrm{Accuracy}(\mathbf{y}_{\tau}, \mathbf{\hat{y}}_{\tau}) = \frac{1}{H} \sum^{t+H}_{\tau=t+1} \mathrm{1}\{\mathbf{y}_{\tau}==\mathbf{\hat{y}}_{\tau}\}

Accuracy.__call__

__call__(y, y_hat, y_insample, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies date stamps per serie to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatTensorAccuracy.

Scaled Continuous Ranked Probability Score (sCRPS)

sCRPS

sCRPS(level=[80, 90], quantiles=None)
Bases: BasePointLoss Scaled Continues Ranked Probability Score Calculates a scaled variation of the CRPS, as proposed by Rangapuram (2021), to measure the accuracy of predicted quantiles y_hat compared to the observation y. This metric averages percentual weighted absolute deviations as defined by the quantile losses. sCRPS(y^τ(q),yτ)=2Ni01QL(y^τ(qyi,τ)qiyi,τdq\mathrm{sCRPS}(\mathbf{\hat{y}}^{(q)}_{\tau}, \mathbf{y}_{\tau}) = \frac{2}{N} \sum_{i} \int^{1}_{0} \frac{\mathrm{QL}(\mathbf{\hat{y}}^{(q}_{\tau} y_{i,\tau})_{q}}{\sum_{i} | y_{i,\tau} |} dq where mathbfhatytau(q\\mathbf{\\hat{y}}^{(q}_{\\tau} is the estimated quantile, and yi,tauy_{i,\\tau} are the target variable realizations. Parameters:
NameTypeDescriptionDefault
levelint listProbability levels for prediction intervals (Defaults median). Defaults to [80, 90].[80, 90]
quantilesfloat listAlternative to level, quantiles to estimate from y distribution. Defaults to None.None

sCRPS.__call__

__call__(y, y_hat, y_insample, mask=None)
Parameters:
NameTypeDescriptionDefault
yTensorActual values.required
y_hatTensorPredicted values.required
maskUnion[Tensor, None]Specifies date stamps per series to consider in loss. Defaults to None.None
Returns:
NameTypeDescription
floatTensorsCRPS.