Exogenous variables
Exogenous variables or external factors are crucial in time series forecasting as they provide additional information that might influence the prediction. These variables could include holiday markers, marketing spending, weather data, or any other external data that correlate with the time series data you are forecasting.
For example, if you’re forecasting ice cream sales, temperature data could serve as a useful exogenous variable. On hotter days, ice cream sales may increase.
To incorporate exogenous variables in TimeGPT, you’ll need to pair each point in your time series data with the corresponding external data.
1. Import packages
First, we import the required packages and initialize the Nixtla client.
import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key = 'my_api_key_provided_by_nixtla'
)
👍 Use an Azure AI endpoint
To use an Azure AI endpoint, remember to set also the
base_url
argument:
nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")
2. Load data
Let’s see an example on predicting day-ahead electricity prices. The
following dataset contains the hourly electricity price (y
column) for
five markets in Europe and US, identified by the unique_id
column. The
columns from Exogenous1
to day_6
are exogenous variables that
TimeGPT will use to predict the prices.
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()
unique_id | ds | y | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | BE | 2016-10-22 00:00:00 | 70.00 | 49593.0 | 57253.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
1 | BE | 2016-10-22 01:00:00 | 37.10 | 46073.0 | 51887.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
2 | BE | 2016-10-22 02:00:00 | 37.10 | 44927.0 | 51896.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
3 | BE | 2016-10-22 03:00:00 | 44.75 | 44483.0 | 48428.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
4 | BE | 2016-10-22 04:00:00 | 37.10 | 44338.0 | 46721.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
3a. Forecasting electricity prices using future exogenous variables
To produce forecasts with future exogenous variables we have to add the
future values of the exogenous variables. Let’s read this dataset. In
this case, we want to predict 24 steps ahead, therefore each unique_id
will have 24 observations.
future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
future_ex_vars_df.head()
unique_id | ds | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 64108.0 | 70318.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
1 | BE | 2016-12-31 01:00:00 | 62492.0 | 67898.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
2 | BE | 2016-12-31 02:00:00 | 61571.0 | 68379.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
3 | BE | 2016-12-31 03:00:00 | 60381.0 | 64972.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
4 | BE | 2016-12-31 04:00:00 | 60298.0 | 62900.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
Let’s call the forecast
method, adding this information:
timegpt_fcst_ex_vars_df = nixtla_client.forecast(df=df, X_df=future_ex_vars_df, h=24, level=[80, 90])
timegpt_fcst_ex_vars_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using future exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
unique_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 | |
---|---|---|---|---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 74.540771 | 84.506861 | 89.003936 | 64.574681 | 60.077606 |
1 | BE | 2016-12-31 01:00:00 | 43.344290 | 52.200879 | 57.771782 | 34.487701 | 28.916798 |
2 | BE | 2016-12-31 02:00:00 | 44.429219 | 51.034622 | 57.623160 | 37.823817 | 31.235279 |
3 | BE | 2016-12-31 03:00:00 | 38.094396 | 48.108948 | 51.528001 | 28.079844 | 24.660791 |
4 | BE | 2016-12-31 04:00:00 | 37.389140 | 46.747685 | 52.186070 | 28.030595 | 22.592211 |
📘 Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set
model="azureai"
:
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models:
timegpt-1
andtimegpt-1-long-horizon
.By default,
timegpt-1
is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon
.
nixtla_client.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_ex_vars_df,
max_insample_length=365,
level=[80, 90],
)
We can also show the importance of the features.
nixtla_client.weights_x.plot.barh(x='features', y='weights')
This plot shows that Exogenous1
and Exogenous2
are the most
important for this forecasting task, as they have the largest weight.
3b. Forecasting electricity prices using historic exogenous variables
In the example above, we just loaded the future exogenous variables.
Often, these are not available because these variables are unknown. We
can als make forecasts using only historic exogenous variables. This can
be done simply by omitting the X_df
parameter. In that case, the
exogenous variables that are present in df
will be considered as
historic exogenous variables.
Important
If you include historic exogenous variables in your model, you are implicitly making assumptions about the future of these exogenous variables in your forecast. It is recommended to make these assumptions explicit by making use of future exogenous variables.
Let’s call the forecast
method, removing X_df
:
timegpt_fcst_hist_ex_vars_df = nixtla_client.forecast(df=df, h=24, level=[80, 90])
timegpt_fcst_hist_ex_vars_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using historical exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
unique_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 | |
---|---|---|---|---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 45.769382 | 55.735472 | 60.232546 | 35.803292 | 31.306217 |
1 | BE | 2016-12-31 01:00:00 | 47.991004 | 56.847593 | 62.418496 | 39.134415 | 33.563512 |
2 | BE | 2016-12-31 02:00:00 | 49.496135 | 56.101537 | 62.690075 | 42.890732 | 36.302195 |
3 | BE | 2016-12-31 03:00:00 | 49.510808 | 59.525360 | 62.944413 | 39.496257 | 36.077203 |
4 | BE | 2016-12-31 04:00:00 | 48.510558 | 57.869103 | 63.307488 | 39.152014 | 33.713629 |
📘 Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set
model="azureai"
:
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models:
timegpt-1
andtimegpt-1-long-horizon
.By default,
timegpt-1
is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon
.
nixtla_client.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_hist_ex_vars_df,
max_insample_length=365,
level=[80, 90],
)
3c. Forecasting electricity prices using future and historic exogenous variables
A third option is to use both historic and future exogenous variables.
For example, we might not have available the future information for
Exogenous1
and Exogenous2
. In this example, we drop these variables
from our future exogenous dataframe (because we assume we do not know
the future value of these variables).
future_ex_vars_df_limited = future_ex_vars_df.drop(columns = ["Exogenous1", "Exogenous2"])
timegpt_fcst_ex_vars_df_limited = nixtla_client.forecast(df=df, X_df=future_ex_vars_df_limited, h=24, level=[80, 90])
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using historical exogenous features: ['Exogenous2', 'Exogenous1']
INFO:nixtla.nixtla_client:Using future exogenous features: ['day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
📘 Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set
model="azureai"
:
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models:
timegpt-1
andtimegpt-1-long-horizon
.By default,
timegpt-1
is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon
.
nixtla_client.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_ex_vars_df_limited,
max_insample_length=365,
level=[80, 90],
)
Note that TimeGPT informs you which variables are used as historic exogenous and which are used as future exogenous.
3d. Forecasting future exogenous variables
A fourth option in case the future exogenous variables are not available
is to forecast them. Below, we’ll show you how we can also forecast
Exogenous1
and Exogenous2
separately, so that you can generate the
future exogenous variables in case they are not available.
# We read the data and create separate dataframes for the historic exogenous that we want to forecast separately.
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df_exog1 = df[['unique_id', 'ds', 'Exogenous1']]
df_exog2 = df[['unique_id', 'ds', 'Exogenous2']]
Next, we can use TimeGPT to forecast Exogenous1
and Exogenous2
. In
this case, we assume these quantities can be separately forecast.
timegpt_fcst_ex1 = nixtla_client.forecast(df=df_exog1, h=24, target_col='Exogenous1')
timegpt_fcst_ex2 = nixtla_client.forecast(df=df_exog2, h=24, target_col='Exogenous2')
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
📘 Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set
model="azureai"
:
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models:
timegpt-1
andtimegpt-1-long-horizon
.By default,
timegpt-1
is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon
.
We can now start creating X_df
, which contains the future exogenous
variables.
timegpt_fcst_ex1 = timegpt_fcst_ex1.rename(columns={'TimeGPT':'Exogenous1'})
timegpt_fcst_ex2 = timegpt_fcst_ex2.rename(columns={'TimeGPT':'Exogenous2'})
X_df = timegpt_fcst_ex1.merge(timegpt_fcst_ex2)
Next, we also need to add the day_0
to day_6
future exogenous
variables. These are easy: this is just the weekday, which we can
extract from the ds
column.
# We have 7 days, for each day a separate column denoting 1/0
for i in range(7):
X_df[f'day_{i}'] = 1 * (pd.to_datetime(X_df['ds']).dt.weekday == i)
We have now created X_df
, let’s investigate it:
X_df.head(10)
unique_id | ds | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 66282.507812 | 70861.390625 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | BE | 2016-12-31 01:00:00 | 64465.335938 | 67851.718750 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
2 | BE | 2016-12-31 02:00:00 | 63257.125000 | 67246.546875 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
3 | BE | 2016-12-31 03:00:00 | 62059.343750 | 64027.210938 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
4 | BE | 2016-12-31 04:00:00 | 61247.132812 | 61523.867188 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
5 | BE | 2016-12-31 05:00:00 | 62052.453125 | 63053.929688 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
6 | BE | 2016-12-31 06:00:00 | 63457.507812 | 65199.175781 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
7 | BE | 2016-12-31 07:00:00 | 65388.433594 | 68285.367188 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
8 | BE | 2016-12-31 08:00:00 | 67406.664062 | 72037.671875 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
9 | BE | 2016-12-31 09:00:00 | 68057.156250 | 72820.468750 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Let’s compare it to our pre-loaded version:
future_ex_vars_df.head(10)
unique_id | ds | Exogenous1 | Exogenous2 | day_0 | day_1 | day_2 | day_3 | day_4 | day_5 | day_6 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 64108.0 | 70318.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
1 | BE | 2016-12-31 01:00:00 | 62492.0 | 67898.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
2 | BE | 2016-12-31 02:00:00 | 61571.0 | 68379.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
3 | BE | 2016-12-31 03:00:00 | 60381.0 | 64972.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
4 | BE | 2016-12-31 04:00:00 | 60298.0 | 62900.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
5 | BE | 2016-12-31 05:00:00 | 60339.0 | 62364.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
6 | BE | 2016-12-31 06:00:00 | 62576.0 | 64242.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
7 | BE | 2016-12-31 07:00:00 | 63732.0 | 65884.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
8 | BE | 2016-12-31 08:00:00 | 66235.0 | 68217.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
9 | BE | 2016-12-31 09:00:00 | 66801.0 | 69921.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
As you can see, the values for Exogenous1
and Exogenous2
are
slightly different, which makes sense because we’ve made a forecast of
these values with TimeGPT.
Let’s create a new forecast of our electricity prices with TimeGPT using
our new X_df
:
timegpt_fcst_ex_vars_df_new = nixtla_client.forecast(df=df, X_df=X_df, h=24, level=[80, 90])
timegpt_fcst_ex_vars_df_new.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Using the following exogenous variables: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
unique_id | ds | TimeGPT | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 | |
---|---|---|---|---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 46.578371 | 40.398307 | 41.808656 | 51.348086 | 52.758435 |
1 | BE | 2016-12-31 01:00:00 | 37.258364 | 28.092805 | 30.929055 | 43.587673 | 46.423923 |
2 | BE | 2016-12-31 02:00:00 | 41.779458 | 29.432284 | 35.379695 | 48.179221 | 54.126632 |
3 | BE | 2016-12-31 03:00:00 | 37.822341 | 25.122863 | 31.484450 | 44.160232 | 50.521820 |
4 | BE | 2016-12-31 04:00:00 | 37.389141 | 23.840454 | 28.535553 | 46.242729 | 50.937828 |
📘 Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set
model="azureai"
:
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models:
timegpt-1
andtimegpt-1-long-horizon
.By default,
timegpt-1
is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon
.
Let’s create a combined dataframe with the two forecasts and plot the values to compare the forecasts.
timegpt_fcst_ex_vars_df = timegpt_fcst_ex_vars_df.rename(columns={'TimeGPT':'TimeGPT-provided_exogenous'})
timegpt_fcst_ex_vars_df_new = timegpt_fcst_ex_vars_df_new.rename(columns={'TimeGPT':'TimeGPT-forecasted_exogenous'})
forecasts = timegpt_fcst_ex_vars_df[['unique_id', 'ds', 'TimeGPT-provided_exogenous']].merge(timegpt_fcst_ex_vars_df_new[['unique_id', 'ds', 'TimeGPT-forecasted_exogenous']])
nixtla_client.plot(
df[['unique_id', 'ds', 'y']],
forecasts,
max_insample_length=365,
)
As you can see, we obtain a slightly different forecast if we use our forecasted exogenous variables.