API Reference
Feature engineering
Create exogenous regressors for your models
source
fourier
fourier (df:~DFType, freq:Union[str,int], season_length:int, k:int, h:int=0, id_col:str='unique_id', time_col:str='ds')
Compute fourier seasonal terms for training and forecasting
Type | Default | Details | |
---|---|---|---|
df | DFType | Dataframe with ids, times and values for the exogenous regressors. | |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
season_length | int | Number of observations per unit of time. Ex: 24 Hourly data. | |
k | int | Maximum order of the fourier terms | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
import pandas as pd
from utilsforecast.data import generate_series
series = generate_series(5, equal_ends=True)
transformed_df, future_df = fourier(series, freq='D', season_length=7, k=2, h=1)
transformed_df
unique_id | ds | y | sin1_7 | sin2_7 | cos1_7 | cos2_7 | |
---|---|---|---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.428973 | -0.974927 | 0.433894 | -0.222526 | -0.900964 |
1 | 0 | 2000-10-06 | 1.423626 | -0.781835 | -0.974926 | 0.623486 | -0.222531 |
2 | 0 | 2000-10-07 | 2.311782 | -0.000005 | -0.000009 | 1.000000 | 1.000000 |
3 | 0 | 2000-10-08 | 3.192191 | 0.781829 | 0.974930 | 0.623493 | -0.222512 |
4 | 0 | 2000-10-09 | 4.148767 | 0.974929 | -0.433877 | -0.222517 | -0.900972 |
… | … | … | … | … | … | … | … |
1096 | 4 | 2001-05-10 | 4.058910 | -0.974927 | 0.433888 | -0.222523 | -0.900967 |
1097 | 4 | 2001-05-11 | 5.178157 | -0.781823 | -0.974934 | 0.623500 | -0.222495 |
1098 | 4 | 2001-05-12 | 6.133142 | -0.000002 | -0.000003 | 1.000000 | 1.000000 |
1099 | 4 | 2001-05-13 | 0.403709 | 0.781840 | 0.974922 | 0.623479 | -0.222548 |
1100 | 4 | 2001-05-14 | 1.081779 | 0.974928 | -0.433882 | -0.222520 | -0.900970 |
future_df
unique_id | ds | sin1_7 | sin2_7 | cos1_7 | cos2_7 | |
---|---|---|---|---|---|---|
0 | 0 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
1 | 1 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
2 | 2 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
3 | 3 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
4 | 4 | 2001-05-15 | 0.433871 | -0.781813 | -0.900975 | 0.623513 |
source
trend
trend (df:~DFType, freq:Union[str,int], h:int=0, id_col:str='unique_id', time_col:str='ds')
Add a trend column with consecutive integers for training and forecasting
Type | Default | Details | |
---|---|---|---|
df | DFType | Dataframe with ids, times and values for the exogenous regressors. | |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
series = generate_series(5, equal_ends=True)
transformed_df, future_df = trend(series, freq='D', h=1)
transformed_df
unique_id | ds | y | trend | |
---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.428973 | 152.0 |
1 | 0 | 2000-10-06 | 1.423626 | 153.0 |
2 | 0 | 2000-10-07 | 2.311782 | 154.0 |
3 | 0 | 2000-10-08 | 3.192191 | 155.0 |
4 | 0 | 2000-10-09 | 4.148767 | 156.0 |
… | … | … | … | … |
1096 | 4 | 2001-05-10 | 4.058910 | 369.0 |
1097 | 4 | 2001-05-11 | 5.178157 | 370.0 |
1098 | 4 | 2001-05-12 | 6.133142 | 371.0 |
1099 | 4 | 2001-05-13 | 0.403709 | 372.0 |
1100 | 4 | 2001-05-14 | 1.081779 | 373.0 |
future_df
unique_id | ds | trend | |
---|---|---|---|
0 | 0 | 2001-05-15 | 374.0 |
1 | 1 | 2001-05-15 | 374.0 |
2 | 2 | 2001-05-15 | 374.0 |
3 | 3 | 2001-05-15 | 374.0 |
4 | 4 | 2001-05-15 | 374.0 |
source
time_features
time_features (df:~DFType, freq:Union[str,int], features:List[Union[str,Callable]], h:int=0, id_col:str='unique_id', time_col:str='ds')
Compute timestamp-based features for training and forecasting
Type | Default | Details | |
---|---|---|---|
df | DFType | Dataframe with ids, times and values for the exogenous regressors. | |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
features | List | Features to compute. Can be string aliases of timestamp attributes or functions to apply to the times. | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
transformed_df, future_df = time_features(series, freq='D', features=['month', 'day'], h=1)
transformed_df
unique_id | ds | y | month | day | |
---|---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.428973 | 10 | 5 |
1 | 0 | 2000-10-06 | 1.423626 | 10 | 6 |
2 | 0 | 2000-10-07 | 2.311782 | 10 | 7 |
3 | 0 | 2000-10-08 | 3.192191 | 10 | 8 |
4 | 0 | 2000-10-09 | 4.148767 | 10 | 9 |
… | … | … | … | … | … |
1096 | 4 | 2001-05-10 | 4.058910 | 5 | 10 |
1097 | 4 | 2001-05-11 | 5.178157 | 5 | 11 |
1098 | 4 | 2001-05-12 | 6.133142 | 5 | 12 |
1099 | 4 | 2001-05-13 | 0.403709 | 5 | 13 |
1100 | 4 | 2001-05-14 | 1.081779 | 5 | 14 |
future_df
unique_id | ds | month | day | |
---|---|---|---|---|
0 | 0 | 2001-05-15 | 5 | 15 |
1 | 1 | 2001-05-15 | 5 | 15 |
2 | 2 | 2001-05-15 | 5 | 15 |
3 | 3 | 2001-05-15 | 5 | 15 |
4 | 4 | 2001-05-15 | 5 | 15 |
source
future_exog_to_historic
future_exog_to_historic (df:~DFType, freq:Union[str,int], features:List[str], h:int=0, id_col:str='unique_id', time_col:str='ds')
Turn future exogenous features into historic by shifting them h
steps.
Type | Default | Details | |
---|---|---|---|
df | DFType | Dataframe with ids, times and values for the exogenous regressors. | |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
features | List | Features to be converted into historic. | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
series_with_prices = series.assign(price=np.random.rand(len(series))).sample(frac=1.0)
series_with_prices
unique_id | ds | y | price | |
---|---|---|---|---|
436 | 2 | 2001-03-26 | 2.369113 | 0.774476 |
312 | 1 | 2001-05-08 | 4.405212 | 0.557957 |
536 | 3 | 2000-11-04 | 4.362074 | 0.745237 |
34 | 0 | 2000-11-08 | 6.111161 | 0.809978 |
652 | 3 | 2001-02-28 | 1.448291 | 0.685294 |
… | … | … | … | … |
609 | 3 | 2001-01-16 | 0.215892 | 0.699703 |
873 | 4 | 2000-09-29 | 5.398198 | 0.677651 |
268 | 1 | 2001-03-25 | 2.393771 | 0.735438 |
171 | 0 | 2001-03-25 | 3.085493 | 0.463871 |
931 | 4 | 2000-11-26 | 0.292296 | 0.691377 |
transformed_df, future_df = future_exog_to_historic(
df=series_with_prices,
freq='D',
features=['price'],
h=2,
)
transformed_df
unique_id | ds | y | price | |
---|---|---|---|---|
0 | 2 | 2001-03-26 | 2.369113 | 0.870133 |
1 | 1 | 2001-05-08 | 4.405212 | 0.869751 |
2 | 3 | 2000-11-04 | 4.362074 | 0.877901 |
3 | 0 | 2000-11-08 | 6.111161 | 0.629413 |
4 | 3 | 2001-02-28 | 1.448291 | 0.088073 |
… | … | … | … | … |
1096 | 3 | 2001-01-16 | 0.215892 | 0.472261 |
1097 | 4 | 2000-09-29 | 5.398198 | 0.887531 |
1098 | 1 | 2001-03-25 | 2.393771 | 0.481712 |
1099 | 0 | 2001-03-25 | 3.085493 | 0.433153 |
1100 | 4 | 2000-11-26 | 0.292296 | 0.620219 |
future_df
unique_id | ds | price | |
---|---|---|---|
0 | 0 | 2001-05-15 | 0.874328 |
1 | 0 | 2001-05-16 | 0.481385 |
2 | 1 | 2001-05-15 | 0.009058 |
3 | 1 | 2001-05-16 | 0.083749 |
4 | 2 | 2001-05-15 | 0.726212 |
5 | 2 | 2001-05-16 | 0.052221 |
6 | 3 | 2001-05-15 | 0.942335 |
7 | 3 | 2001-05-16 | 0.274816 |
8 | 4 | 2001-05-15 | 0.267545 |
9 | 4 | 2001-05-16 | 0.112129 |
source
pipeline
pipeline (df:~DFType, features:List[Callable], freq:Union[str,int], h:int=0, id_col:str='unique_id', time_col:str='ds')
Compute several features for training and forecasting
Type | Default | Details | |
---|---|---|---|
df | DFType | Dataframe with ids, times and values for the exogenous regressors. | |
features | List | List of features to compute. Must take only df, freq, h, id_col and time_col (other arguments must be fixed). | |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. | |
h | int | 0 | Forecast horizon. |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
Returns | Tuple | Original DataFrame with the computed features |
def is_weekend(times):
if isinstance(times, pd.Index):
dow = times.weekday + 1 # monday=0 in pandas and 1 in polars
else:
dow = times.dt.weekday()
return dow >= 6
def even_days_and_months(times):
if isinstance(times, pd.Index):
out = pd.DataFrame(
{
'even_day': (times.weekday + 1) % 2 == 0,
'even_month': times.month % 2 == 0,
}
)
else:
# for polars you can return a list of expressions
out = [
(times.dt.weekday() % 2 == 0).alias('even_day'),
(times.dt.month() % 2 == 0).alias('even_month'),
]
return out
features = [
trend,
partial(fourier, season_length=7, k=1),
partial(fourier, season_length=28, k=1),
partial(time_features, features=['day', is_weekend, even_days_and_months]),
]
transformed_df, future_df = pipeline(
series,
features=features,
freq='D',
h=1,
)
transformed_df
unique_id | ds | y | trend | sin1_7 | cos1_7 | sin1_28 | cos1_28 | day | is_weekend | even_day | even_month | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.428973 | 152.0 | -0.974927 | -0.222526 | 0.433885 | -9.009683e-01 | 5 | False | True | True |
1 | 0 | 2000-10-06 | 1.423626 | 153.0 | -0.781835 | 0.623486 | 0.222522 | -9.749276e-01 | 6 | False | False | True |
2 | 0 | 2000-10-07 | 2.311782 | 154.0 | -0.000005 | 1.000000 | 0.000001 | -1.000000e+00 | 7 | True | True | True |
3 | 0 | 2000-10-08 | 3.192191 | 155.0 | 0.781829 | 0.623493 | -0.222520 | -9.749281e-01 | 8 | True | False | True |
4 | 0 | 2000-10-09 | 4.148767 | 156.0 | 0.974929 | -0.222517 | -0.433883 | -9.009693e-01 | 9 | False | False | True |
… | … | … | … | … | … | … | … | … | … | … | … | … |
1096 | 4 | 2001-05-10 | 4.058910 | 369.0 | -0.974927 | -0.222523 | 0.900969 | 4.338843e-01 | 10 | False | True | False |
1097 | 4 | 2001-05-11 | 5.178157 | 370.0 | -0.781823 | 0.623500 | 0.974929 | 2.225177e-01 | 11 | False | False | False |
1098 | 4 | 2001-05-12 | 6.133142 | 371.0 | -0.000002 | 1.000000 | 1.000000 | 4.251100e-07 | 12 | True | True | False |
1099 | 4 | 2001-05-13 | 0.403709 | 372.0 | 0.781840 | 0.623479 | 0.974927 | -2.225243e-01 | 13 | True | False | False |
1100 | 4 | 2001-05-14 | 1.081779 | 373.0 | 0.974928 | -0.222520 | 0.900969 | -4.338835e-01 | 14 | False | False | False |
future_df
unique_id | ds | trend | sin1_7 | cos1_7 | sin1_28 | cos1_28 | day | is_weekend | even_day | even_month | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False |
1 | 1 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False |
2 | 2 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False |
3 | 3 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False |
4 | 4 | 2001-05-15 | 374.0 | 0.433871 | -0.900975 | 0.781829 | -0.623493 | 15 | False | True | False |