module utilsforecast.preprocessing

Utilities for processing data before training/analysis

Global Variables

  • pl

function id_time_grid

id_time_grid(
    df: DataFrame,
    freq: Union[str, int],
    start: Union[str, int, date, datetime] = 'per_serie',
    end: Union[str, int, date, datetime] = 'global',
    id_col: str = 'unique_id',
    time_col: str = 'ds'
) → DataFrame
Generate all expected combiations of ids and times. Args:
  • df (pandas or polars DataFrame): Input data
  • freq (str or int): Series’ frequency
  • start (str, int, date or datetime, optional): Initial timestamp for the series. * ‘per_serie’ uses each serie’s first timestamp * ‘global’ uses the first timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) Defaults to “per_serie”.
  • end (str, int, date or datetime, optional): Initial timestamp for the series. * ‘per_serie’ uses each serie’s last timestamp * ‘global’ uses the last timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) Defaults to “global”.
  • id_col (str, optional): Column that identifies each serie. Defaults to ‘unique_id’.
  • time_col (str, optional): Column that identifies each timestamp. Defaults to ‘ds’.
Returns:
  • pandas or polars DataFrame: Dataframe with expected ids and times.

function fill_gaps

fill_gaps(
    df: DataFrame,
    freq: Union[str, int],
    start: Union[str, int, date, datetime] = 'per_serie',
    end: Union[str, int, date, datetime] = 'global',
    id_col: str = 'unique_id',
    time_col: str = 'ds'
) → DataFrame
Enforce start and end datetimes for dataframe. Args:
  • df (pandas or polars DataFrame): Input data
  • freq (str or int): Series’ frequency
  • start (str, int, date or datetime, optional): Initial timestamp for the series. * ‘per_serie’ uses each serie’s first timestamp * ‘global’ uses the first timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) Defaults to “per_serie”.
  • end (str, int, date or datetime, optional): Initial timestamp for the series. * ‘per_serie’ uses each serie’s last timestamp * ‘global’ uses the last timestamp seen in the data * Can also be a specific timestamp or integer, e.g. ‘2000-01-01’, 2000 or datetime(2000, 1, 1) Defaults to “global”.
  • id_col (str, optional): Column that identifies each serie. Defaults to ‘unique_id’.
  • time_col (str, optional): Column that identifies each timestamp. Defaults to ‘ds’.
Returns:
  • pandas or polars DataFrame: Dataframe with gaps filled.

This file was automatically generated via lazydocs.