1. Synthetic Panel Data


source

generate_series

 generate_series (n_series:int, freq:str='D', min_length:int=50,
                  max_length:int=500, n_temporal_features:int=0,
                  n_static_features:int=0, equal_ends:bool=False,
                  seed:int=0)

*Generate Synthetic Panel Series.

Generates n_series of frequency freq of different lengths in the interval [min_length, max_length]. If n_temporal_features > 0, then each serie gets temporal features with random values. If n_static_features > 0, then a static dataframe is returned along the temporal dataframe. If equal_ends == True then all series end at the same date.

Parameters:
n_series: int, number of series for synthetic panel.
min_length: int, minimal length of synthetic panel’s series.
max_length: int, minimal length of synthetic panel’s series.
n_temporal_features: int, default=0, number of temporal exogenous variables for synthetic panel’s series.
n_static_features: int, default=0, number of static exogenous variables for synthetic panel’s series.
equal_ends: bool, if True, series finish in the same date stamp ds.
freq: str, frequency of the data, panda’s available frequencies.

Returns:
freq: pandas.DataFrame, synthetic panel with columns [unique_id, ds, y] and exogenous.*

synthetic_panel = generate_series(n_series=2)
synthetic_panel.groupby('unique_id').head(4)
temporal_df, static_df = generate_series(n_series=1000, n_static_features=2,
                                         n_temporal_features=4, equal_ends=False)
static_df.head(2)

2. AirPassengers Data

The classic Box & Jenkins airline data. Monthly totals of international airline passengers, 1949 to 1960.

It has been used as a reference on several forecasting libraries, since it is a series that shows clear trends and seasonalities it offers a nice opportunity to quickly showcase a model’s predictions performance.

AirPassengersDF.head(12)
#We are going to plot the ARIMA predictions, and the prediction intervals.
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
plot_df = AirPassengersDF.set_index('ds')

plot_df[['y']].plot(ax=ax, linewidth=2)
ax.set_title('AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()
import numpy as np
import pandas as pd
n_static_features = 3
n_series = 5

static_features = np.random.uniform(low=0.0, high=1.0, 
                        size=(n_series, n_static_features))
static_df = pd.DataFrame.from_records(static_features, 
                   columns = [f'static_{i}'for i in  range(n_static_features)])
static_df['unique_id'] = np.arange(n_series)
static_df

3. Panel AirPassengers Data

Extension to classic Box & Jenkins airline data. Monthly totals of international airline passengers, 1949 to 1960.

It includes two series with static, temporal and future exogenous variables, that can help to explore the performance of models like NBEATSx and TFT.

fig, ax = plt.subplots(1, 1, figsize = (20, 7))
plot_df = AirPassengersPanel.set_index('ds')

plot_df.groupby('unique_id')['y'].plot(legend=True)
ax.set_title('AirPassengers Panel Data', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(title='unique_id', prop={'size': 15})
ax.grid()
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
plot_df = AirPassengersPanel[AirPassengersPanel.unique_id=='Airline1'].set_index('ds')

plot_df[['y', 'trend', 'y_[lag12]']].plot(ax=ax, linewidth=2)
ax.set_title('Box-Cox AirPassengers Data', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()

4. Time Features

We have developed a utility that generates normalized calendar features for use as absolute positional embeddings in Transformer-based models. These embeddings capture seasonal patterns in time series data and can be easily incorporated into the model architecture. Additionally, the features can be used as exogenous variables in other models to inform them of calendar patterns in the data.

References
- Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang. “Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting”


source

augment_calendar_df

 augment_calendar_df (df, freq='H')

*> * Q - [month] > * M - [month] > * W - [Day of month, week of year] > * D - [Day of week, day of month, day of year] > * B - [Day of week, day of month, day of year] > * H - [Hour of day, day of week, day of month, day of year] > * T - [Minute of hour*, hour of day, day of week, day of month, day of year] > * S - [Second of minute, minute of hour, hour of day, day of week, day of month, day of year] minute returns a number from 0-3 corresponding to the 15 minute period it falls into.


source

time_features_from_frequency_str

 time_features_from_frequency_str (freq_str:str)

Returns a list of time features that will be appropriate for the given frequency string. Parameters ———- freq_str Frequency string of the form [multiple][granularity] such as “12H”, “5min”, “1D” etc.


source

WeekOfYear

 WeekOfYear ()

Week of year encoded as value between [-0.5, 0.5]


source

MonthOfYear

 MonthOfYear ()

Month of year encoded as value between [-0.5, 0.5]


source

DayOfYear

 DayOfYear ()

Day of year encoded as value between [-0.5, 0.5]


source

DayOfMonth

 DayOfMonth ()

Day of month encoded as value between [-0.5, 0.5]


source

DayOfWeek

 DayOfWeek ()

Hour of day encoded as value between [-0.5, 0.5]


source

HourOfDay

 HourOfDay ()

Hour of day encoded as value between [-0.5, 0.5]


source

MinuteOfHour

 MinuteOfHour ()

Minute of hour encoded as value between [-0.5, 0.5]


source

SecondOfMinute

 SecondOfMinute ()

Minute of hour encoded as value between [-0.5, 0.5]


source

TimeFeature

 TimeFeature ()

Initialize self. See help(type(self)) for accurate signature.

AirPassengerPanelCalendar, calendar_cols = augment_calendar_df(df=AirPassengersPanel, freq='M')
AirPassengerPanelCalendar.head()
plot_df = AirPassengerPanelCalendar[AirPassengerPanelCalendar.unique_id=='Airline1'].set_index('ds')
plt.plot(plot_df['month'])
plt.grid()
plt.xlabel('Datestamp')
plt.ylabel('Normalized Month')
plt.show()

source

get_indexer_raise_missing

 get_indexer_raise_missing (idx:pandas.core.indexes.base.Index,
                            vals:List[str])