> ## Documentation Index
> Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Exogenous features

> Use exogenous regressors for training and predicting

```python theme={null}
import lightgbm as lgb
import pandas as pd
from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.utils import generate_daily_series, generate_prices_for_series
```

## Data setup

```python theme={null}
series = generate_daily_series(
    100, equal_ends=True, n_static_features=2
).rename(columns={'static_1': 'product_id'})
series.head()
```

|   | unique\_id | ds         | y          | static\_0 | product\_id |
| - | ---------- | ---------- | ---------- | --------- | ----------- |
| 0 | id\_00     | 2000-10-05 | 39.811983  | 79        | 45          |
| 1 | id\_00     | 2000-10-06 | 103.274013 | 79        | 45          |
| 2 | id\_00     | 2000-10-07 | 176.574744 | 79        | 45          |
| 3 | id\_00     | 2000-10-08 | 258.987900 | 79        | 45          |
| 4 | id\_00     | 2000-10-09 | 344.940404 | 79        | 45          |

## Use existing exogenous features

In mlforecast the required columns are the series identifier, time and
target. Any extra columns you have, like `static_0` and `product_id`
here are considered to be static and are replicated when constructing
the features for the next timestamp. You can disable this by passing
`static_features` to `MLForecast.preprocess` or `MLForecast.fit`, which
will only keep the columns you define there as static. Keep in mind that
all features in your input dataframe will be used for training, so
you’ll have to provide the future values of exogenous features to
`MLForecast.predict` through the `X_df` argument.

Consider the following example. Suppose that we have a prices catalog
for each id and date.

```python theme={null}
prices_catalog = generate_prices_for_series(series)
prices_catalog.head()
```

|   | ds         | unique\_id | price    |
| - | ---------- | ---------- | -------- |
| 0 | 2000-10-05 | id\_00     | 0.548814 |
| 1 | 2000-10-06 | id\_00     | 0.715189 |
| 2 | 2000-10-07 | id\_00     | 0.602763 |
| 3 | 2000-10-08 | id\_00     | 0.544883 |
| 4 | 2000-10-09 | id\_00     | 0.423655 |

And that you have already merged these prices into your series
dataframe.

```python theme={null}
series_with_prices = series.merge(prices_catalog, how='left')
series_with_prices.head()
```

|   | unique\_id | ds         | y          | static\_0 | product\_id | price    |
| - | ---------- | ---------- | ---------- | --------- | ----------- | -------- |
| 0 | id\_00     | 2000-10-05 | 39.811983  | 79        | 45          | 0.548814 |
| 1 | id\_00     | 2000-10-06 | 103.274013 | 79        | 45          | 0.715189 |
| 2 | id\_00     | 2000-10-07 | 176.574744 | 79        | 45          | 0.602763 |
| 3 | id\_00     | 2000-10-08 | 258.987900 | 79        | 45          | 0.544883 |
| 4 | id\_00     | 2000-10-09 | 344.940404 | 79        | 45          | 0.423655 |

This dataframe will be passed to `MLForecast.fit` (or
`MLForecast.preprocess`). However, since the price is dynamic we have to
tell that method that only `static_0` and `product_id` are static.

```python theme={null}
fcst = MLForecast(
    models=lgb.LGBMRegressor(n_jobs=1, random_state=0, verbosity=-1),
    freq='D',
    lags=[7],
    lag_transforms={
        1: [ExpandingMean()],
        7: [RollingMean(window_size=14)],
    },
    date_features=['dayofweek', 'month'],
    num_threads=2,
)
fcst.fit(series_with_prices, static_features=['static_0', 'product_id'])
```

```text theme={null}
MLForecast(models=[LGBMRegressor], freq=D, lag_features=['lag7', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size14'], date_features=['dayofweek', 'month'], num_threads=2)
```

The features used for training are stored in
`MLForecast.ts.features_order_`. As you can see `price` was used for
training.

```python theme={null}
fcst.ts.features_order_
```

```text theme={null}
['static_0',
 'product_id',
 'price',
 'lag7',
 'expanding_mean_lag1',
 'rolling_mean_lag7_window_size14',
 'dayofweek',
 'month']
```

So in order to update the price in each timestep we just call
`MLForecast.predict` with our forecast horizon and pass the prices
catalog through `X_df`.

```python theme={null}
preds = fcst.predict(h=7, X_df=prices_catalog)
preds.head()
```

|   | unique\_id | ds         | LGBMRegressor |
| - | ---------- | ---------- | ------------- |
| 0 | id\_00     | 2001-05-15 | 418.930093    |
| 1 | id\_00     | 2001-05-16 | 499.487368    |
| 2 | id\_00     | 2001-05-17 | 20.321885     |
| 3 | id\_00     | 2001-05-18 | 102.310778    |
| 4 | id\_00     | 2001-05-19 | 185.340281    |

## Generating exogenous features

Nixtla provides some utilities to generate exogenous features for both
training and forecasting such as [statsforecast’s
mstl\_decomposition](https://nixtlaverse.nixtla.io/statsforecast/docs/how-to-guides/generating_features.html)
or the [transform\_exog function](transforming_exog.html). We also have
[utilsforecast’s fourier
function](https://nixtlaverse.nixtla.io/utilsforecast/feature_engineering.html#fourier),
which we’ll demonstrate here.

```python theme={null}
from sklearn.linear_model import LinearRegression
from utilsforecast.feature_engineering import fourier
```

Suppose you start with some data like the one above where we have a
couple of static features.

```python theme={null}
series.head()
```

|   | unique\_id | ds         | y          | static\_0 | product\_id |
| - | ---------- | ---------- | ---------- | --------- | ----------- |
| 0 | id\_00     | 2000-10-05 | 39.811983  | 79        | 45          |
| 1 | id\_00     | 2000-10-06 | 103.274013 | 79        | 45          |
| 2 | id\_00     | 2000-10-07 | 176.574744 | 79        | 45          |
| 3 | id\_00     | 2000-10-08 | 258.987900 | 79        | 45          |
| 4 | id\_00     | 2000-10-09 | 344.940404 | 79        | 45          |

Now we’d like to add some fourier terms to model the seasonality. We can
do that with the following:

```python theme={null}
transformed_df, future_df = fourier(series, freq='D', season_length=7, k=2, h=7)
```

This provides an extended training dataset.

```python theme={null}
transformed_df.head()
```

|   | unique\_id | ds         | y          | static\_0 | product\_id | sin1\_7   | sin2\_7   | cos1\_7   | cos2\_7   |
| - | ---------- | ---------- | ---------- | --------- | ----------- | --------- | --------- | --------- | --------- |
| 0 | id\_00     | 2000-10-05 | 39.811983  | 79        | 45          | 0.781832  | 0.974928  | 0.623490  | -0.222521 |
| 1 | id\_00     | 2000-10-06 | 103.274013 | 79        | 45          | 0.974928  | -0.433884 | -0.222521 | -0.900969 |
| 2 | id\_00     | 2000-10-07 | 176.574744 | 79        | 45          | 0.433884  | -0.781831 | -0.900969 | 0.623490  |
| 3 | id\_00     | 2000-10-08 | 258.987900 | 79        | 45          | -0.433884 | 0.781832  | -0.900969 | 0.623490  |
| 4 | id\_00     | 2000-10-09 | 344.940404 | 79        | 45          | -0.974928 | 0.433884  | -0.222521 | -0.900969 |

Along with the future values of the features.

```python theme={null}
future_df.head()
```

|   | unique\_id | ds         | sin1\_7   | sin2\_7   | cos1\_7   | cos2\_7   |
| - | ---------- | ---------- | --------- | --------- | --------- | --------- |
| 0 | id\_00     | 2001-05-15 | -0.781828 | -0.974930 | 0.623494  | -0.222511 |
| 1 | id\_00     | 2001-05-16 | 0.000006  | 0.000011  | 1.000000  | 1.000000  |
| 2 | id\_00     | 2001-05-17 | 0.781835  | 0.974925  | 0.623485  | -0.222533 |
| 3 | id\_00     | 2001-05-18 | 0.974927  | -0.433895 | -0.222527 | -0.900963 |
| 4 | id\_00     | 2001-05-19 | 0.433878  | -0.781823 | -0.900972 | 0.623500  |

We can now train using only these features (and the static ones).

```python theme={null}
fcst2 = MLForecast(models=LinearRegression(), freq='D')
fcst2.fit(transformed_df, static_features=['static_0', 'product_id'])
```

```text theme={null}
MLForecast(models=[LinearRegression], freq=D, lag_features=[], date_features=[], num_threads=1)
```

And provide the future values to the predict method.

```python theme={null}
fcst2.predict(h=7, X_df=future_df).head()
```

|   | unique\_id | ds         | LinearRegression |
| - | ---------- | ---------- | ---------------- |
| 0 | id\_00     | 2001-05-15 | 275.822342       |
| 1 | id\_00     | 2001-05-16 | 262.258117       |
| 2 | id\_00     | 2001-05-17 | 238.195850       |
| 3 | id\_00     | 2001-05-18 | 240.997814       |
| 4 | id\_00     | 2001-05-19 | 262.247123       |

```python theme={null}
preds2 = fcst.predict(7, X_df=prices_catalog)
preds3 = fcst.predict(7, new_df=series_with_prices, X_df=prices_catalog)

pd.testing.assert_frame_equal(preds, preds2)
pd.testing.assert_frame_equal(preds, preds3)
```
