API Reference
Evaluation
Model performance evaluation
source
evaluate
evaluate (df:~DFType, metrics:List[Callable], models:Optional[List[str]]=None, train_df:Optional[~DFType]=None, level:Optional[List[int]]=None, id_col:str='unique_id', time_col:str='ds', target_col:str='y', agg_fn:Optional[str]=None)
Evaluate forecast using different metrics.
Type | Default | Details | |
---|---|---|---|
df | DFType | Forecasts to evaluate. Must have id_col , time_col , target_col and models’ predictions. | |
metrics | List | Functions with arguments df , models , id_col , target_col and optionally train_df . | |
models | Optional | None | Names of the models to evaluate. If None will use every column in the dataframe after removing id, time and target. |
train_df | Optional | None | Training set. Used to evaluate metrics such as mase . |
level | Optional | None | Prediction interval levels. Used to compute losses that rely on quantiles. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
target_col | str | y | Column that contains the target. |
agg_fn | Optional | None | Statistic to compute on the scores by id to reduce them to a single number. |
Returns | DFType | Metrics with one row per (id, metric) combination and one column per model. If agg_fn is not None , there is only one row per metric. |
from functools import partial
import numpy as np
import pandas as pd
from utilsforecast.losses import *
from utilsforecast.data import generate_series
series = generate_series(10, n_models=2, level=[80, 95])
series['unique_id'] = series['unique_id'].astype('int')
models = ['model0', 'model1']
metrics = [
mae,
mse,
rmse,
mape,
smape,
partial(mase, seasonality=7),
quantile_loss,
mqloss,
coverage,
calibration,
scaled_crps,
]
evaluation = evaluate(
series,
metrics=metrics,
models=models,
train_df=series,
level=[80, 95],
)
evaluation
unique_id | metric | model0 | model1 | |
---|---|---|---|---|
0 | 0 | mae | 0.158108 | 0.163246 |
1 | 1 | mae | 0.160109 | 0.143805 |
2 | 2 | mae | 0.159815 | 0.170510 |
3 | 3 | mae | 0.168537 | 0.161595 |
4 | 4 | mae | 0.170182 | 0.163329 |
… | … | … | … | … |
175 | 5 | scaled_crps | 0.034202 | 0.035472 |
176 | 6 | scaled_crps | 0.034880 | 0.033610 |
177 | 7 | scaled_crps | 0.034337 | 0.034745 |
178 | 8 | scaled_crps | 0.033336 | 0.032459 |
179 | 9 | scaled_crps | 0.034766 | 0.035243 |
summary = evaluation.drop(columns='unique_id').groupby('metric').mean().reset_index()
summary
metric | model0 | model1 | |
---|---|---|---|
0 | calibration_q0.025 | 0.000000 | 0.000000 |
1 | calibration_q0.1 | 0.000000 | 0.000000 |
2 | calibration_q0.9 | 0.833993 | 0.815833 |
3 | calibration_q0.975 | 0.853991 | 0.836949 |
4 | coverage_level80 | 0.833993 | 0.815833 |
5 | coverage_level95 | 0.853991 | 0.836949 |
6 | mae | 0.161286 | 0.162281 |
7 | mape | 0.048894 | 0.049624 |
8 | mase | 0.966846 | 0.975354 |
9 | mqloss | 0.056904 | 0.056216 |
10 | mse | 0.048653 | 0.049198 |
11 | quantile_loss_q0.025 | 0.019990 | 0.019474 |
12 | quantile_loss_q0.1 | 0.067315 | 0.065781 |
13 | quantile_loss_q0.9 | 0.095510 | 0.093841 |
14 | quantile_loss_q0.975 | 0.044803 | 0.045767 |
15 | rmse | 0.220357 | 0.221543 |
16 | scaled_crps | 0.035003 | 0.034576 |
17 | smape | 0.024475 | 0.024902 |