API Reference
Preprocessing
Utilities for processing data before training/analysis
source
id_time_grid
Generate all expected combiations of ids and times.
Type | Default | Details | |
---|---|---|---|
df | DFType | Input data | |
freq | Union | Series’ frequency | |
start | Union | per_serie | Initial timestamp for the series. * ‘per_serie’ uses each serie’s first timestamp * ‘global’ uses the first timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) |
end | Union | global | Initial timestamp for the series. * ‘per_serie’ uses each serie’s last timestamp * ‘global’ uses the last timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestamp. |
Returns | DFType | Dataframe with expected ids and times. |
source
fill_gaps
Enforce start and end datetimes for dataframe.
Type | Default | Details | |
---|---|---|---|
df | DFType | Input data | |
freq | Union | Series’ frequency | |
start | Union | per_serie | Initial timestamp for the series. * ‘per_serie’ uses each serie’s first timestamp * ‘global’ uses the first timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) |
end | Union | global | Initial timestamp for the series. * ‘per_serie’ uses each serie’s last timestamp * ‘global’ uses the last timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) |
id_col | str | unique_id | Column that identifies each serie. |
time_col | str | ds | Column that identifies each timestamp. |
Returns | DFType | Dataframe with gaps filled. |
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0 |
1 | 0 | 2021-01-01 | 1 |
2 | 0 | 2023-01-01 | 2 |
3 | 1 | 2021-01-01 | 3 |
4 | 1 | 2022-01-01 | 4 |
The default functionality is taking the current starts and only extending the end date to be the same for all series.
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 1 | 2021-01-01 | 3.0 |
5 | 1 | 2022-01-01 | 4.0 |
6 | 1 | 2023-01-01 | NaN |
We can also specify end='per_serie'
to only fill possible gaps within
each serie.
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 1 | 2021-01-01 | 3.0 |
5 | 1 | 2022-01-01 | 4.0 |
We can also specify an end date in the future.
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 0 | 2024-01-01 | NaN |
5 | 1 | 2021-01-01 | 3.0 |
6 | 1 | 2022-01-01 | 4.0 |
7 | 1 | 2023-01-01 | NaN |
8 | 1 | 2024-01-01 | NaN |
We can set all series to start at the same time.
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020-01-01 | 0.0 |
1 | 0 | 2021-01-01 | 1.0 |
2 | 0 | 2022-01-01 | NaN |
3 | 0 | 2023-01-01 | 2.0 |
4 | 1 | 2020-01-01 | NaN |
5 | 1 | 2021-01-01 | 3.0 |
6 | 1 | 2022-01-01 | 4.0 |
7 | 1 | 2023-01-01 | NaN |
We can also set a common start date for all series (which can be earlier than their current starts).
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2019-01-01 | NaN |
1 | 0 | 2020-01-01 | 0.0 |
2 | 0 | 2021-01-01 | 1.0 |
3 | 0 | 2022-01-01 | NaN |
4 | 0 | 2023-01-01 | 2.0 |
5 | 1 | 2019-01-01 | NaN |
6 | 1 | 2020-01-01 | NaN |
7 | 1 | 2021-01-01 | 3.0 |
8 | 1 | 2022-01-01 | 4.0 |
9 | 1 | 2023-01-01 | NaN |
In case the times are integers the frequency, start and end must also be integers.
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2020 | 0 |
1 | 0 | 2021 | 1 |
2 | 0 | 2023 | 2 |
3 | 1 | 2021 | 3 |
4 | 1 | 2022 | 4 |
unique_id | ds | y | |
---|---|---|---|
0 | 0 | 2019 | NaN |
1 | 0 | 2020 | 0.0 |
2 | 0 | 2021 | 1.0 |
3 | 0 | 2022 | NaN |
4 | 0 | 2023 | 2.0 |
5 | 0 | 2024 | NaN |
6 | 1 | 2019 | NaN |
7 | 1 | 2020 | NaN |
8 | 1 | 2021 | 3.0 |
9 | 1 | 2022 | 4.0 |
10 | 1 | 2023 | NaN |
11 | 1 | 2024 | NaN |
The function also accepts polars dataframes
unique_id | ds | y |
---|---|---|
i64 | datetime[μs] | i64 |
0 | 2020-01-01 00:00:00 | 0 |
0 | 2022-01-01 00:00:00 | 1 |
0 | 2023-01-01 00:00:00 | 2 |
1 | 2021-01-01 00:00:00 | 3 |
1 | 2022-01-01 00:00:00 | 4 |
unique_id | ds | y |
---|---|---|
i64 | datetime[ms] | i64 |
0 | 2019-01-01 00:00:00 | null |
0 | 2020-01-01 00:00:00 | 0 |
0 | 2021-01-01 00:00:00 | null |
0 | 2022-01-01 00:00:00 | 1 |
0 | 2023-01-01 00:00:00 | 2 |
… | … | … |
1 | 2020-01-01 00:00:00 | null |
1 | 2021-01-01 00:00:00 | 3 |
1 | 2022-01-01 00:00:00 | 4 |
1 | 2023-01-01 00:00:00 | null |
1 | 2024-01-01 00:00:00 | null |
unique_id | ds | y |
---|---|---|
i64 | date | i64 |
0 | 2020-01-01 | 0 |
0 | 2022-01-01 | 1 |
0 | 2023-01-01 | 2 |
1 | 2021-01-01 | 3 |
1 | 2022-01-01 | 4 |
unique_id | ds | y |
---|---|---|
i64 | date | i64 |
0 | 2020-01-01 | 0 |
0 | 2021-01-01 | null |
0 | 2022-01-01 | 1 |
0 | 2023-01-01 | 2 |
0 | 2024-01-01 | null |
1 | 2020-01-01 | null |
1 | 2021-01-01 | 3 |
1 | 2022-01-01 | 4 |
1 | 2023-01-01 | null |
1 | 2024-01-01 | null |
unique_id | ds | y |
---|---|---|
i64 | i64 | i64 |
0 | 2020 | 0 |
0 | 2021 | 1 |
0 | 2023 | 2 |
1 | 2021 | 3 |
1 | 2022 | 4 |
unique_id | ds | y |
---|---|---|
i64 | i64 | i64 |
0 | 2019 | null |
0 | 2020 | 0 |
0 | 2021 | 1 |
0 | 2022 | null |
0 | 2023 | 2 |
… | … | … |
1 | 2020 | null |
1 | 2021 | 3 |
1 | 2022 | 4 |
1 | 2023 | null |
1 | 2024 | null |