API Reference
Preprocessing
Utilities for processing data before training/analysis
source
fill_gaps
fill_gaps (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.D ataFrame], freq:Union[str,int], start:Union[str,int,datetime.d ate,datetime.datetime]='per_serie', end:Union[str,int,datetime.date,datetime.datetime]='global', id_col:str='unique_id', time_col:str='ds')
Enforce start and end datetimes for dataframe.
Type | Default | Details | |
---|---|---|---|
df | Union | Input data | |
freq | Union | Series’ frequency | |
start | Union | per_serie | Initial timestamp for the series. * ‘per_serie’ uses each serie’s first timestamp * ‘global’ uses the first timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) |
end | Union | global | Initial timestamp for the series. * ‘per_serie’ uses each serie’s last timestamp * ‘global’ uses the last timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestamp. |
Returns | DataFrame | Dataframe with gaps filled. |
df = pd.DataFrame(
{
'unique_id': [0, 0, 0, 1, 1],
'ds': pd.to_datetime(['2020', '2021', '2023', '2021', '2022']),
'y': np.arange(5),
}
)
df
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0 |
1 | 0 | 2021-01-01 | 1 |
2 | 0 | 2023-01-01 | 2 |
3 | 1 | 2021-01-01 | 3 |
4 | 1 | 2022-01-01 | 4 |
The default functionality is taking the current starts and only extending the end date to be the same for all series.
fill_gaps(
df,
freq='YS',
)
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 1 | 2021-01-01 | 3.0 |
5 | 1 | 2022-01-01 | 4.0 |
6 | 1 | 2023-01-01 | NaN |
We can also specify end='per_serie'
to only fill possible gaps within
each serie.
fill_gaps(
df,
freq='YS',
end='per_serie',
)
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 1 | 2021-01-01 | 3.0 |
5 | 1 | 2022-01-01 | 4.0 |
We can also specify an end date in the future.
fill_gaps(
df,
freq='YS',
end='2024',
)
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 0 | 2024-01-01 | NaN |
5 | 1 | 2021-01-01 | 3.0 |
6 | 1 | 2022-01-01 | 4.0 |
7 | 1 | 2023-01-01 | NaN |
8 | 1 | 2024-01-01 | NaN |
We can set all series to start at the same time.
fill_gaps(
df,
freq='YS',
start='global'
)
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 1 | 2020-01-01 | NaN |
5 | 1 | 2021-01-01 | 3.0 |
6 | 1 | 2022-01-01 | 4.0 |
7 | 1 | 2023-01-01 | NaN |
We can also set a common start date for all series (which can be earlier than their current starts).
fill_gaps(
df,
freq='YS',
start='2019',
)
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2019-01-01 | NaN |
1 | 0 | 2020-01-01 | 0.0 |
2 | 0 | 2021-01-01 | 1.0 |
3 | 0 | 2022-01-01 | NaN |
4 | 0 | 2023-01-01 | 2.0 |
5 | 1 | 2019-01-01 | NaN |
6 | 1 | 2020-01-01 | NaN |
7 | 1 | 2021-01-01 | 3.0 |
8 | 1 | 2022-01-01 | 4.0 |
9 | 1 | 2023-01-01 | NaN |
In case the times are integers the frequency, start and end must also be integers.
df = pd.DataFrame(
{
'unique_id': [0, 0, 0, 1, 1],
'ds': [2020, 2021, 2023, 2021, 2022],
'y': np.arange(5),
}
)
df
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020 | 0 |
1 | 0 | 2021 | 1 |
2 | 0 | 2023 | 2 |
3 | 1 | 2021 | 3 |
4 | 1 | 2022 | 4 |
fill_gaps(
df,
freq=1,
start=2019,
end=2024,
)
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2019 | NaN |
1 | 0 | 2020 | 0.0 |
2 | 0 | 2021 | 1.0 |
3 | 0 | 2022 | NaN |
4 | 0 | 2023 | 2.0 |
5 | 0 | 2024 | NaN |
6 | 1 | 2019 | NaN |
7 | 1 | 2020 | NaN |
8 | 1 | 2021 | 3.0 |
9 | 1 | 2022 | 4.0 |
10 | 1 | 2023 | NaN |
11 | 1 | 2024 | NaN |
The function also accepts polars dataframes
df = pl.DataFrame(
{
'unique_id': [0, 0, 0, 1, 1],
'ds': [
datetime(2020, 1, 1), datetime(2022, 1, 1), datetime(2023, 1, 1),
datetime(2021, 1, 1), datetime(2022, 1, 1)],
'y': np.arange(5),
}
)
df
unique_id | ds | y |
---|---|---|
i64 | datetime[μs] | i64 |
0 | 2020-01-01 00:00:00 | 0 |
0 | 2022-01-01 00:00:00 | 1 |
0 | 2023-01-01 00:00:00 | 2 |
1 | 2021-01-01 00:00:00 | 3 |
1 | 2022-01-01 00:00:00 | 4 |
fill_gaps(
df,
freq='1y',
start=datetime(2019, 1, 1),
end=datetime(2024, 1, 1),
)
unique_id | ds | y |
---|---|---|
i64 | datetime[μs] | i64 |
0 | 2019-01-01 00:00:00 | null |
0 | 2020-01-01 00:00:00 | 0 |
0 | 2021-01-01 00:00:00 | null |
0 | 2022-01-01 00:00:00 | 1 |
0 | 2023-01-01 00:00:00 | 2 |
0 | 2024-01-01 00:00:00 | null |
1 | 2019-01-01 00:00:00 | null |
1 | 2020-01-01 00:00:00 | null |
1 | 2021-01-01 00:00:00 | 3 |
1 | 2022-01-01 00:00:00 | 4 |
1 | 2023-01-01 00:00:00 | null |
1 | 2024-01-01 00:00:00 | null |
df = pl.DataFrame(
{
'unique_id': [0, 0, 0, 1, 1],
'ds': [
date(2020, 1, 1), date(2022, 1, 1), date(2023, 1, 1),
date(2021, 1, 1), date(2022, 1, 1)],
'y': np.arange(5),
}
)
df
unique_id | ds | y |
---|---|---|
i64 | date | i64 |
0 | 2020-01-01 | 0 |
0 | 2022-01-01 | 1 |
0 | 2023-01-01 | 2 |
1 | 2021-01-01 | 3 |
1 | 2022-01-01 | 4 |
fill_gaps(
df,
freq='1y',
start=date(2020, 1, 1),
end=date(2024, 1, 1),
)
unique_id | ds | y |
---|---|---|
i64 | date | i64 |
0 | 2020-01-01 | 0 |
0 | 2021-01-01 | null |
0 | 2022-01-01 | 1 |
0 | 2023-01-01 | 2 |
0 | 2024-01-01 | null |
1 | 2020-01-01 | null |
1 | 2021-01-01 | 3 |
1 | 2022-01-01 | 4 |
1 | 2023-01-01 | null |
1 | 2024-01-01 | null |
df = pl.DataFrame(
{
'unique_id': [0, 0, 0, 1, 1],
'ds': [2020, 2021, 2023, 2021, 2022],
'y': np.arange(5),
}
)
df
unique_id | ds | y |
---|---|---|
i64 | i64 | i64 |
0 | 2020 | 0 |
0 | 2021 | 1 |
0 | 2023 | 2 |
1 | 2021 | 3 |
1 | 2022 | 4 |
fill_gaps(
df,
freq=1,
start=2019,
end=2024,
)
unique_id | ds | y |
---|---|---|
i64 | i64 | i64 |
0 | 2019 | null |
0 | 2020 | 0 |
0 | 2021 | 1 |
0 | 2022 | null |
0 | 2023 | 2 |
0 | 2024 | null |
1 | 2019 | null |
1 | 2020 | null |
1 | 2021 | 3 |
1 | 2022 | 4 |
1 | 2023 | null |
1 | 2024 | null |