source

evaluate

 evaluate (df:~DFType, metrics:List[Callable],
           models:Optional[List[str]]=None,
           train_df:Optional[~DFType]=None,
           level:Optional[List[int]]=None, id_col:str='unique_id',
           time_col:str='ds', target_col:str='y',
           agg_fn:Optional[str]=None)

Evaluate forecast using different metrics.

TypeDefaultDetails
dfDFTypeForecasts to evaluate.
Must have id_col, time_col, target_col and models’ predictions.
metricsListFunctions with arguments df, models, id_col, target_col and optionally train_df.
modelsOptionalNoneNames of the models to evaluate.
If None will use every column in the dataframe after removing id, time and target.
train_dfOptionalNoneTraining set. Used to evaluate metrics such as mase.
levelOptionalNonePrediction interval levels. Used to compute losses that rely on quantiles.
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestep, its values can be timestamps or integers.
target_colstryColumn that contains the target.
agg_fnOptionalNoneStatistic to compute on the scores by id to reduce them to a single number.
ReturnsDFTypeMetrics with one row per (id, metric) combination and one column per model.
If agg_fn is not None, there is only one row per metric.
from functools import partial

import numpy as np
import pandas as pd

from utilsforecast.losses import *
from utilsforecast.data import generate_series
series = generate_series(10, n_models=2, level=[80, 95])
series['unique_id'] = series['unique_id'].astype('int')
models = ['model0', 'model1']
metrics = [
    mae,
    mse,
    rmse,
    mape,
    smape,
    partial(mase, seasonality=7),
    quantile_loss,
    mqloss,
    coverage,
    calibration,
    scaled_crps,
]
evaluation = evaluate(
    series,
    metrics=metrics,
    models=models,
    train_df=series,
    level=[80, 95],
)
evaluation
unique_idmetricmodel0model1
00mae0.1581080.163246
11mae0.1601090.143805
22mae0.1598150.170510
33mae0.1685370.161595
44mae0.1701820.163329
1755scaled_crps0.0342020.035472
1766scaled_crps0.0348800.033610
1777scaled_crps0.0343370.034745
1788scaled_crps0.0333360.032459
1799scaled_crps0.0347660.035243
summary = evaluation.drop(columns='unique_id').groupby('metric').mean().reset_index()
summary
metricmodel0model1
0calibration_q0.0250.0000000.000000
1calibration_q0.10.0000000.000000
2calibration_q0.90.8339930.815833
3calibration_q0.9750.8539910.836949
4coverage_level800.8339930.815833
5coverage_level950.8539910.836949
6mae0.1612860.162281
7mape0.0488940.049624
8mase0.9668460.975354
9mqloss0.0569040.056216
10mse0.0486530.049198
11quantile_loss_q0.0250.0199900.019474
12quantile_loss_q0.10.0673150.065781
13quantile_loss_q0.90.0955100.093841
14quantile_loss_q0.9750.0448030.045767
15rmse0.2203570.221543
16scaled_crps0.0350030.034576
17smape0.0244750.024902