Evaluation - Nixtla

evaluate

 evaluate (df:~AnyDFType, metrics:List[Callable],
           models:Optional[List[str]]=None,
           train_df:Optional[~AnyDFType]=None,
           level:Optional[List[int]]=None, id_col:str='unique_id',
           time_col:str='ds', target_col:str='y',
           agg_fn:Optional[str]=None)

Evaluate forecast using different metrics.

	Type	Default	Details
df	AnyDFType		Forecasts to evaluate. Must have `id_col`, `time_col`, `target_col` and models’ predictions.
metrics	List		Functions with arguments `df`, `models`, `id_col`, `target_col` and optionally `train_df`.
models	Optional	None	Names of the models to evaluate. If `None` will use every column in the dataframe after removing id, time and target.
train_df	Optional	None	Training set. Used to evaluate metrics such as `mase`.
level	Optional	None	Prediction interval levels. Used to compute losses that rely on quantiles.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
target_col	str	y	Column that contains the target.
agg_fn	Optional	None	Statistic to compute on the scores by id to reduce them to a single number.
Returns	AnyDFType		Metrics with one row per (id, metric) combination and one column per model. If `agg_fn` is not `None`, there is only one row per metric.

from functools import partial

import numpy as np
import pandas as pd

from utilsforecast.losses import *
from utilsforecast.data import generate_series

series = generate_series(10, n_models=2, level=[80, 95])

series['unique_id'] = series['unique_id'].astype('int')

models = ['model0', 'model1']
metrics = [
    mae,
    mse,
    rmse,
    mape,
    smape,
    partial(mase, seasonality=7),
    quantile_loss,
    mqloss,
    coverage,
    calibration,
    scaled_crps,
]

evaluation = evaluate(
    series,
    metrics=metrics,
    models=models,
    train_df=series,
    level=[80, 95],
)
evaluation

	unique_id	metric	model0	model1
0	0	mae	0.158108	0.163246
1	1	mae	0.160109	0.143805
2	2	mae	0.159815	0.170510
3	3	mae	0.168537	0.161595
4	4	mae	0.170182	0.163329
…	…	…	…	…
175	5	scaled_crps	0.034202	0.035472
176	6	scaled_crps	0.034880	0.033610
177	7	scaled_crps	0.034337	0.034745
178	8	scaled_crps	0.033336	0.032459
179	9	scaled_crps	0.034766	0.035243

summary = evaluation.drop(columns='unique_id').groupby('metric').mean().reset_index()
summary

	metric	model0	model1
0	calibration_q0.025	0.000000	0.000000
1	calibration_q0.1	0.000000	0.000000
2	calibration_q0.9	0.833993	0.815833
3	calibration_q0.975	0.853991	0.836949
4	coverage_level80	0.833993	0.815833
5	coverage_level95	0.853991	0.836949
6	mae	0.161286	0.162281
7	mape	0.048894	0.049624
8	mase	0.966846	0.975354
9	mqloss	0.056904	0.056216
10	mse	0.048653	0.049198
11	quantile_loss_q0.025	0.019990	0.019474
12	quantile_loss_q0.1	0.067315	0.065781
13	quantile_loss_q0.9	0.095510	0.093841
14	quantile_loss_q0.975	0.044803	0.045767
15	rmse	0.220357	0.221543
16	scaled_crps	0.035003	0.034576
17	smape	0.024475	0.024902

API Reference

​evaluate

evaluate