Skip to main content
Assign relative importance weights to individual timesteps when training a model.

Motivation

When working with time series data, it is possible that we want to assign a higher or lower importance to certain values or periods in the series. For example, historical sales data cover the abnormal COVID period, so the model should not learn too much from that historical sequence. Alternativaly, you might be interested in the model being very good at modeling periods when a promotion is running. Thus, we need to a way to tell the model when to assign more or less importance to specific timesteps.

Understanding sample_weight

The sample_weight is a reserved column name, similar to how we expect the data to have columns ["unique_id", "ds", "y"]. In that column, we can assign a positive integer to indicate how important a timestep is.
  • Assigning a value of 0 means the particular timestep does not contribute to the loss.
  • Higher values increase the contribution to the loss, so the model learns “more” about these timesteps.

Key considerations

Deep learning models are trained with windows of data. Internally, we take the mean of the sample_weight for a window to get its relative importance. Therefore, training windows are never completely ignored, unless the entire window has timesteps with sample_weight of 0. In most cases, windows with timesteps assigned to a sample_weight of 0 will have a lower “mean importance”, and so will contribute less to the loss of the model. Take the following example:
dsysample_weight
t11
t21
t31
t41
t50
t60
t70
t80
t91
t101
t111
t121
With input_size=4 and h=4, NeuralForecast creates sliding windows of 8 timesteps. The sample_weight for each window is the
mean over its forecast horizon (the future portion):
WindowInput (t)Future (t)Mean sample_weight
1t1 – t4t5 – t80.00 — ignored
2t2 – t5t6 – t90.25 — low importance
3t3 – t6t7 – t100.50 — moderate importance
4t4 – t7t8 – t110.75 — high importance
5t5 – t8t9 – t121.00 — full importance
Window 1 is completely excluded from training: its entire forecast horizon falls within the zeroed period. Windows 2–4 contribute progressively more as the horizon moves out of it. Window 5 trains normally. The model still “sees” timesteps in the input context of windows 2–5. It learns what happened during that period, without being penalized for predicting its future.

Important notes

  • sample_weight must be greater than or equal to 0
  • there is no upper bound for sample_weight. It works as a relative importance. So a value of 2 vs 1 means “twice as important”. 100 vs 50 would be interpreted the same way.

Usage

Let’s see an example of how sample_weight can be used in practice. We use the Air Passengers dataset and cover different scenarios.

Setup

import logging
import warnings

import numpy as np

from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae
from utilsforecast.plotting import plot_series

from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS

warnings.filterwarnings("ignore")
logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)

Load data

from neuralforecast.utils import AirPassengersDF

Y_df = AirPassengersDF.copy()
Y_train_df = Y_df[Y_df.ds <= "1959-12-31"]  # 132 months train
Y_test_df = Y_df[Y_df.ds > "1959-12-31"]    # 12 months test

Y_train_df.tail()
unique_iddsy
1271.01959-08-31559.0
1281.01959-09-30463.0
1291.01959-10-31407.0
1301.01959-11-30362.0
1311.01959-12-31405.0
plot_series(Y_train_df)

Prolonged anomaly

Here, we inject a prolonged anomaly where values are 50% lower than they actually are. For that anomalous period, we set sample_weight to 0, and 1 otherwise. We then compare how the model performs when setting sample_weight against using the default behavior.
s1 = Y_train_df.copy()                                                                                                              
anomaly_mask = s1["ds"].between("1953-01-31", "1953-12-31")                                                                         
s1.loc[anomaly_mask, "y"] *= 0.5                                                                                                    
s1["sample_weight"] = 1.0  
s1.loc[anomaly_mask, "sample_weight"] = 0.0 

s1.head()
unique_iddsysample_weight
01.01949-01-31112.01.0
11.01949-02-28118.01.0
21.01949-03-31132.01.0
31.01949-04-30129.01.0
41.01949-05-31121.01.0
plot_series(s1)

Training and evaluating

H = 12
MAX_STEPS = 100

models = [
    NHITS(
        h=H, 
        input_size=3*H, 
        max_steps=MAX_STEPS, 
        scaler_type="robust", 
        enable_progress_bar=False, 
        enable_model_summary=False
    )
]

nf = NeuralForecast(models=models, freq="ME")

# With `sample_weight`
nf.fit(df=s1)
preds_sw = nf.predict()
preds_sw = preds_sw.rename(columns={"NHITS": "NHITS_SW"})

# Without `sample_weight`
nf.fit(df=s1.drop(columns=["sample_weight"]))
preds = nf.predict()

eval_df = Y_test_df.merge(preds_sw, "left", ["unique_id", "ds"])
eval_df = eval_df.merge(preds, "left", ["unique_id", "ds"])

evaluation = evaluate(eval_df, metrics=[mae])
evaluation
Seed set to 1
unique_idmetricNHITS_SWNHITS
01.0mae18.04006445.309769
plot_series(s1, eval_df, max_insample_length=5*12)
From the figure above and from the calculated MAE, we can see that using sample_weight improved the performance of the model as we assigned less importance to the anomalous period.

Isolated anomalies

Now, let’s consider a scenario where isolated anomalies occur in the data. As before, we assign a sample_weight of 0 to those anomalies and 1 otherwise, and compare the performance.
rng = np.random.default_rng(42)                                                                                                     
s2 = Y_train_df.copy()                                                                                                              
outlier_idx = rng.choice(s2.index, size=4, replace=False)                                                                           
s2.loc[outlier_idx, "y"] *= rng.uniform(2.0, 3.0, size=4)  # random spikes
s2["sample_weight"] = 1.0                                                                                                           
s2.loc[outlier_idx, "sample_weight"] = 0.0
s2.head()
unique_iddsysample_weight
01.01949-01-31112.01.0
11.01949-02-28118.01.0
21.01949-03-31132.01.0
31.01949-04-30129.01.0
41.01949-05-31121.01.0
plot_series(s2)

Training and evaluating

# With `sample_weight`
nf.fit(df=s2)
preds_sw = nf.predict()
preds_sw = preds_sw.rename(columns={"NHITS": "NHITS_SW"})

# Without `sample_weight`
nf.fit(df=s2.drop(columns=["sample_weight"]))
preds = nf.predict()

eval_df = Y_test_df.merge(preds_sw, "left", ["unique_id", "ds"])
eval_df = eval_df.merge(preds, "left", ["unique_id", "ds"])

evaluation = evaluate(eval_df, metrics=[mae])
evaluation
unique_idmetricNHITS_SWNHITS
01.0mae62.24764661.333698
plot_series(s2, eval_df, max_insample_length=5*12)
In this case, using the sample_weight is not sufficient. In fact, the model performs slightly worse than not using sample_weight. Here, it might be beneficial to use other methods robust to outliers, like selecting the HuberLoss as the optimization objective.

Emphasize certain periods

Now, let’s consider the scenario where we want to give more importance to the summer months. Those are the months with the highest traffic, so we might want our model to be espcially good in those periods.
s3 = Y_train_df.copy()
s3["sample_weight"] = 1.0                                                                                                           
summer_mask = s3["ds"].dt.month.isin([6, 7, 8])
s3.loc[summer_mask, "sample_weight"] = 3.0  
s3.tail()
unique_iddsysample_weight
1271.01959-08-31559.03.0
1281.01959-09-30463.01.0
1291.01959-10-31407.01.0
1301.01959-11-30362.01.0
1311.01959-12-31405.01.0
plot_series(s3)

Training and evaluating

# With `sample_weight`
nf.fit(df=s3)
preds_sw = nf.predict()
preds_sw = preds_sw.rename(columns={"NHITS": "NHITS_SW"})

# Without `sample_weight`
nf.fit(df=s3.drop(columns=["sample_weight"]))
preds = nf.predict()

eval_df = Y_test_df.merge(preds_sw, "left", ["unique_id", "ds"])
eval_df = eval_df.merge(preds, "left", ["unique_id", "ds"])

evaluation = evaluate(eval_df, metrics=[mae])
evaluation
unique_idmetricNHITS_SWNHITS
01.0mae11.67267313.421109
plot_series(s3, eval_df, max_insample_length=5*12)
Here, we see that using sample_weight improved the performance again. Although it’s hard to see in the plot, the model trained with sample_weight better forecasts the peaks of summer, resulting in a performance gain.

Summary

NeuralForecast now supports the sample_weight column which is a reserved column name to indicate the relative importance of each timestep. During training, the sample_weight of each window is the mean over the forecast horizon. This helps the model either ignore anomalous sequences or data points, or focus more on important periods.