source

fill_gaps

 fill_gaps
            (df:Union[pandas.core.frame.DataFrame,polars.dataframe.frame.D
            ataFrame], freq:Union[str,int], start:Union[str,int,datetime.d
            ate,datetime.datetime]='per_serie',
            end:Union[str,int,datetime.date,datetime.datetime]='global',
            id_col:str='unique_id', time_col:str='ds')

Enforce start and end datetimes for dataframe.

TypeDefaultDetails
dfUnionInput data
freqUnionSeries’ frequency
startUnionper_serieInitial timestamp for the series.
* ‘per_serie’ uses each serie’s first timestamp
* ‘global’ uses the first timestamp seen in the data
* Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1)
endUnionglobalInitial timestamp for the series.
* ‘per_serie’ uses each serie’s last timestamp
* ‘global’ uses the last timestamp seen in the data
* Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1)
id_colstrunique_idColumn that identifies each serie.
time_colstrdsColumn that identifies each timestamp.
ReturnsDataFrameDataframe with gaps filled.
df = pd.DataFrame(
    {
        'unique_id': [0, 0, 0, 1, 1],
        'ds': pd.to_datetime(['2020', '2021', '2023', '2021', '2022']),
        'y': np.arange(5),
    }
)
df
unique_iddsy
002020-01-010
102021-01-011
202023-01-012
312021-01-013
412022-01-014

The default functionality is taking the current starts and only extending the end date to be the same for all series.

fill_gaps(
    df,
    freq='YS',
)
unique_iddsy
002020-01-010.0
102021-01-011.0
202022-01-01NaN
302023-01-012.0
412021-01-013.0
512022-01-014.0
612023-01-01NaN

We can also specify end='per_serie' to only fill possible gaps within each serie.

fill_gaps(
    df,
    freq='YS',
    end='per_serie',
)
unique_iddsy
002020-01-010.0
102021-01-011.0
202022-01-01NaN
302023-01-012.0
412021-01-013.0
512022-01-014.0

We can also specify an end date in the future.

fill_gaps(
    df,
    freq='YS',
    end='2024',
)
unique_iddsy
002020-01-010.0
102021-01-011.0
202022-01-01NaN
302023-01-012.0
402024-01-01NaN
512021-01-013.0
612022-01-014.0
712023-01-01NaN
812024-01-01NaN

We can set all series to start at the same time.

fill_gaps(
    df,
    freq='YS',
    start='global'
)
unique_iddsy
002020-01-010.0
102021-01-011.0
202022-01-01NaN
302023-01-012.0
412020-01-01NaN
512021-01-013.0
612022-01-014.0
712023-01-01NaN

We can also set a common start date for all series (which can be earlier than their current starts).

fill_gaps(
    df,
    freq='YS',
    start='2019',
)
unique_iddsy
002019-01-01NaN
102020-01-010.0
202021-01-011.0
302022-01-01NaN
402023-01-012.0
512019-01-01NaN
612020-01-01NaN
712021-01-013.0
812022-01-014.0
912023-01-01NaN

In case the times are integers the frequency, start and end must also be integers.

df = pd.DataFrame(
    {
        'unique_id': [0, 0, 0, 1, 1],
        'ds': [2020, 2021, 2023, 2021, 2022],
        'y': np.arange(5),
    }
)
df
unique_iddsy
0020200
1020211
2020232
3120213
4120224
fill_gaps(
    df,
    freq=1,
    start=2019,
    end=2024,
)
unique_iddsy
002019NaN
1020200.0
2020211.0
302022NaN
4020232.0
502024NaN
612019NaN
712020NaN
8120213.0
9120224.0
1012023NaN
1112024NaN

The function also accepts polars dataframes

df = pl.DataFrame(
    {
        'unique_id': [0, 0, 0, 1, 1],
        'ds': [
            datetime(2020, 1, 1), datetime(2022, 1, 1), datetime(2023, 1, 1),
            datetime(2021, 1, 1), datetime(2022, 1, 1)],
        'y': np.arange(5),
    }
)
df
unique_iddsy
i64datetime[μs]i64
02020-01-01 00:00:000
02022-01-01 00:00:001
02023-01-01 00:00:002
12021-01-01 00:00:003
12022-01-01 00:00:004
fill_gaps(
    df,
    freq='1y',
    start=datetime(2019, 1, 1),
    end=datetime(2024, 1, 1),
)
unique_iddsy
i64datetime[μs]i64
02019-01-01 00:00:00null
02020-01-01 00:00:000
02021-01-01 00:00:00null
02022-01-01 00:00:001
02023-01-01 00:00:002
02024-01-01 00:00:00null
12019-01-01 00:00:00null
12020-01-01 00:00:00null
12021-01-01 00:00:003
12022-01-01 00:00:004
12023-01-01 00:00:00null
12024-01-01 00:00:00null
df = pl.DataFrame(
    {
        'unique_id': [0, 0, 0, 1, 1],
        'ds': [
            date(2020, 1, 1), date(2022, 1, 1), date(2023, 1, 1),
            date(2021, 1, 1), date(2022, 1, 1)],
        'y': np.arange(5),
    }
)
df
unique_iddsy
i64datei64
02020-01-010
02022-01-011
02023-01-012
12021-01-013
12022-01-014
fill_gaps(
    df,
    freq='1y',
    start=date(2020, 1, 1),
    end=date(2024, 1, 1),
)
unique_iddsy
i64datei64
02020-01-010
02021-01-01null
02022-01-011
02023-01-012
02024-01-01null
12020-01-01null
12021-01-013
12022-01-014
12023-01-01null
12024-01-01null
df = pl.DataFrame(
    {
        'unique_id': [0, 0, 0, 1, 1],
        'ds': [2020, 2021, 2023, 2021, 2022],
        'y': np.arange(5),
    }
)
df
unique_iddsy
i64i64i64
020200
020211
020232
120213
120224
fill_gaps(
    df,
    freq=1,
    start=2019,
    end=2024,
)
unique_iddsy
i64i64i64
02019null
020200
020211
02022null
020232
02024null
12019null
12020null
120213
120224
12023null
12024null