> ## Documentation Index
> Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
> Use this file to discover all available pages before exploring further.

> DeepAR: Probabilistic autoregressive RNN for forecasting. Uses Monte Carlo sampling with distribution outputs for uncertainty quantification in time series.

# DeepAR

The DeepAR model produces probabilistic forecasts based on an
autoregressive recurrent neural network optimized on panel data using
cross-learning. DeepAR obtains its forecast distribution uses a Markov
Chain Monte Carlo sampler with the following conditional probability:
$\mathbb{P}(\mathbf{y}_{[t+1:t+H]}|\;\mathbf{y}_{[:t]},\; \mathbf{x}^{(f)}_{[:t+H]},\; \mathbf{x}^{(s)})$

where $\mathbf{x}^{(s)}$ are static exogenous inputs,
$\mathbf{x}^{(f)}_{[:t+H]}$ are future exogenous available at the time
of the prediction. The predictions are obtained by transforming the
hidden states $\mathbf{h}_{t}$ into predictive distribution parameters
$\theta_{t}$, and then generating samples $\mathbf{\hat{y}}_{[t+1:t+H]}$
through Monte Carlo sampling trajectories.

$$

\begin{align}
\mathbf{h}_{t} &= \textrm{RNN}([\mathbf{y}_{t},\mathbf{x}^{(f)}_{t+1},\mathbf{x}^{(s)}], \mathbf{h}_{t-1})\\
\mathbf{\theta}_{t}&=\textrm{Linear}(\mathbf{h}_{t}) \\
\hat{y}_{t+1}&=\textrm{sample}(\;\mathrm{P}(y_{t+1}\;|\;\mathbf{\theta}_{t})\;)
\end{align}

$$

**References**

* [David Salinas, Valentin Flunkert, Jan Gasthaus,
  Tim Januschowski (2020). “DeepAR: Probabilistic forecasting with
  autoregressive recurrent networks”. International Journal of
  Forecasting.](https://www.sciencedirect.com/science/article/pii/S0169207019301888)
* [Alexander Alexandrov et. al (2020). “GluonTS: Probabilistic and Neural
  Time Series Modeling in Python”. Journal of Machine Learning
  Research.](https://www.jmlr.org/papers/v21/19-820.html)

> **Exogenous Variables, Losses, and Parameters Availability**
>
> Given the sampling procedure during inference, DeepAR only supports
> [`DistributionLoss`](./losses.pytorch.html#distributionloss)
> as training loss.
>
> Note that DeepAR generates a non-parametric forecast distribution
> using Monte Carlo. We use this sampling procedure also during
> validation to make it closer to the inference procedure. Therefore,
> only the
> [`MQLoss`](./losses.pytorch.html#mqloss)
> is available for validation.
>
> Aditionally, Monte Carlo implies that historic exogenous variables are
> not available for the model.

<img src="https://mintcdn.com/nixtla/ldwvWbCUC65OBWwN/neuralforecast/imgs_models/deepar.jpeg?fit=max&auto=format&n=ldwvWbCUC65OBWwN&q=85&s=983c54524911982f115ca4a72009414c" alt="Figure 1. DeepAR model, during training the optimization signal comes from likelihood of observations, during inference a recurrent multi-step strategy is used to generate predictive distributions." width="1600" height="909" data-path="neuralforecast/imgs_models/deepar.jpeg" />

*Figure 1. DeepAR model, during training
the optimization signal comes from likelihood of observations, during
inference a recurrent multi-step strategy is used to generate predictive
distributions.*

## 1. DeepAR

### `DeepAR`

```python theme={null}
DeepAR(
    h,
    input_size=-1,
    h_train=1,
    lstm_n_layers=2,
    lstm_hidden_size=128,
    lstm_dropout=0.1,
    decoder_hidden_layers=0,
    decoder_hidden_size=0,
    trajectory_samples=100,
    stat_exog_list=None,
    hist_exog_list=None,
    futr_exog_list=None,
    exclude_insample_y=False,
    loss=DistributionLoss(
        distribution="StudentT", level=[80, 90], return_params=False
    ),
    valid_loss=MAE(),
    max_steps=1000,
    learning_rate=0.001,
    num_lr_decays=3,
    early_stop_patience_steps=-1,
    val_monitor="ptl/val_loss",
    val_check_steps=100,
    batch_size=32,
    valid_batch_size=None,
    windows_batch_size=1024,
    inference_windows_batch_size=-1,
    start_padding_enabled=False,
    training_data_availability_threshold=0.0,
    step_size=1,
    scaler_type="identity",
    random_seed=1,
    drop_last_loader=False,
    alias=None,
    optimizer=None,
    optimizer_kwargs=None,
    lr_scheduler=None,
    lr_scheduler_kwargs=None,
    dataloader_kwargs=None,
    **trainer_kwargs
)
```

Bases: <code>[BaseModel](#neuralforecast.common._base_model.BaseModel)</code>

DeepAR

**Parameters:**

| Name                                   | Type                                                                            | Description                                                                                                                                                                                                                                                                    | Default                                                                                                                                          |
| -------------------------------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `h`                                    | <code>[int](#int)</code>                                                        | Forecast horizon.                                                                                                                                                                                                                                                              | *required*                                                                                                                                       |
| `input_size`                           | <code>[int](#int)</code>                                                        | maximum sequence length for truncated train backpropagation. Default -1 uses 3 \* horizon                                                                                                                                                                                      | <code>-1</code>                                                                                                                                  |
| `h_train`                              | <code>[int](#int)</code>                                                        | maximum sequence length for truncated train backpropagation. Default 1.                                                                                                                                                                                                        | <code>1</code>                                                                                                                                   |
| `lstm_n_layers`                        | <code>[int](#int)</code>                                                        | number of LSTM layers.                                                                                                                                                                                                                                                         | <code>2</code>                                                                                                                                   |
| `lstm_hidden_size`                     | <code>[int](#int)</code>                                                        | LSTM hidden size.                                                                                                                                                                                                                                                              | <code>128</code>                                                                                                                                 |
| `lstm_dropout`                         | <code>[float](#float)</code>                                                    | LSTM dropout.                                                                                                                                                                                                                                                                  | <code>0.1</code>                                                                                                                                 |
| `decoder_hidden_layers`                | <code>[int](#int)</code>                                                        | number of decoder MLP hidden layers. Default: 0 for linear layer.                                                                                                                                                                                                              | <code>0</code>                                                                                                                                   |
| `decoder_hidden_size`                  | <code>[int](#int)</code>                                                        | decoder MLP hidden size. Default: 0 for linear layer.                                                                                                                                                                                                                          | <code>0</code>                                                                                                                                   |
| `trajectory_samples`                   | <code>[int](#int)</code>                                                        | number of Monte Carlo trajectories during inference.                                                                                                                                                                                                                           | <code>100</code>                                                                                                                                 |
| `stat_exog_list`                       | <code>str list</code>                                                           | static exogenous columns.                                                                                                                                                                                                                                                      | <code>None</code>                                                                                                                                |
| `hist_exog_list`                       | <code>str list</code>                                                           | historic exogenous columns.                                                                                                                                                                                                                                                    | <code>None</code>                                                                                                                                |
| `futr_exog_list`                       | <code>str list</code>                                                           | future exogenous columns.                                                                                                                                                                                                                                                      | <code>None</code>                                                                                                                                |
| `exclude_insample_y`                   | <code>[bool](#bool)</code>                                                      | the model skips the autoregressive features y\[t-input\_size:t] if True.                                                                                                                                                                                                       | <code>False</code>                                                                                                                               |
| `loss`                                 | <code>PyTorch module</code>                                                     | instantiated train loss class from [losses collection](./losses.pytorch.html).                                                                                                                                                                                                 | <code>[DistributionLoss](#neuralforecast.losses.pytorch.DistributionLoss)(distribution='StudentT', level=\[80, 90], return\_params=False)</code> |
| `valid_loss`                           | <code>PyTorch module</code>                                                     | instantiated valid loss class from [losses collection](./losses.pytorch.html).                                                                                                                                                                                                 | <code>[MAE](#neuralforecast.losses.pytorch.MAE)()</code>                                                                                         |
| `max_steps`                            | <code>[int](#int)</code>                                                        | maximum number of training steps.                                                                                                                                                                                                                                              | <code>1000</code>                                                                                                                                |
| `learning_rate`                        | <code>[float](#float)</code>                                                    | Learning rate between (0, 1).                                                                                                                                                                                                                                                  | <code>0.001</code>                                                                                                                               |
| `num_lr_decays`                        | <code>[int](#int)</code>                                                        | Number of learning rate decays, evenly distributed across max\_steps.                                                                                                                                                                                                          | <code>3</code>                                                                                                                                   |
| `early_stop_patience_steps`            | <code>[int](#int)</code>                                                        | Number of validation iterations before early stopping.                                                                                                                                                                                                                         | <code>-1</code>                                                                                                                                  |
| `val_monitor`                          | <code>[str](#str)</code>                                                        | metric to monitor for early stopping. Valid options: "ptl/val\_loss", "valid\_loss", "train\_loss". Default: "ptl/val\_loss".                                                                                                                                                  | <code>'ptl/val\_loss'</code>                                                                                                                     |
| `val_check_steps`                      | <code>[int](#int)</code>                                                        | Number of training steps between every validation loss check.                                                                                                                                                                                                                  | <code>100</code>                                                                                                                                 |
| `batch_size`                           | <code>[int](#int)</code>                                                        | number of different series in each batch.                                                                                                                                                                                                                                      | <code>32</code>                                                                                                                                  |
| `valid_batch_size`                     | <code>[int](#int)</code>                                                        | number of different series in each validation and test batch, if None uses batch\_size.                                                                                                                                                                                        | <code>None</code>                                                                                                                                |
| `windows_batch_size`                   | <code>[int](#int)</code>                                                        | number of windows to sample in each training batch, default uses all.                                                                                                                                                                                                          | <code>1024</code>                                                                                                                                |
| `inference_windows_batch_size`         | <code>[int](#int)</code>                                                        | number of windows to sample in each inference batch, -1 uses all.                                                                                                                                                                                                              | <code>-1</code>                                                                                                                                  |
| `start_padding_enabled`                | <code>[bool](#bool)</code>                                                      | if True, the model will pad the time series with zeros at the beginning, by input size.                                                                                                                                                                                        | <code>False</code>                                                                                                                               |
| `training_data_availability_threshold` | <code>[Union](#Union)\[[float](#float), [List](#List)\[[float](#float)]]</code> | minimum fraction of valid data points required for training windows. Single float applies to both insample and outsample; list of two floats specifies \[insample\_fraction, outsample\_fraction]. Default 0.0 allows windows with only 1 valid data point (current behavior). | <code>0.0</code>                                                                                                                                 |
| `step_size`                            | <code>[int](#int)</code>                                                        | step size between each window of temporal data.                                                                                                                                                                                                                                | <code>1</code>                                                                                                                                   |
| `scaler_type`                          | <code>[str](#str)</code>                                                        | type of scaler for temporal inputs normalization see [temporal scalers](https://github.com/Nixtla/neuralforecast/blob/main/neuralforecast/common/_scalers.py).                                                                                                                 | <code>'identity'</code>                                                                                                                          |
| `random_seed`                          | <code>[int](#int)</code>                                                        | random\_seed for pytorch initializer and numpy generators.                                                                                                                                                                                                                     | <code>1</code>                                                                                                                                   |
| `drop_last_loader`                     | <code>[bool](#bool)</code>                                                      | if True `TimeSeriesDataLoader` drops last non-full batch.                                                                                                                                                                                                                      | <code>False</code>                                                                                                                               |
| `alias`                                | <code>[str](#str)</code>                                                        | optional, Custom name of the model.                                                                                                                                                                                                                                            | <code>None</code>                                                                                                                                |
| `optimizer`                            | <code>Subclass of 'torch.optim.Optimizer'</code>                                | optional, user specified optimizer instead of the default choice (Adam).                                                                                                                                                                                                       | <code>None</code>                                                                                                                                |
| `optimizer_kwargs`                     | <code>[dict](#dict)</code>                                                      | optional, list of parameters used by the user specified `optimizer`.                                                                                                                                                                                                           | <code>None</code>                                                                                                                                |
| `lr_scheduler`                         | <code>Subclass of 'torch.optim.lr\_scheduler.LRScheduler'</code>                | optional, user specified lr\_scheduler instead of the default choice (StepLR).                                                                                                                                                                                                 | <code>None</code>                                                                                                                                |
| `lr_scheduler_kwargs`                  | <code>[dict](#dict)</code>                                                      | optional, list of parameters used by the user specified `lr_scheduler`.                                                                                                                                                                                                        | <code>None</code>                                                                                                                                |
| `dataloader_kwargs`                    | <code>[dict](#dict)</code>                                                      | optional, list of parameters passed into the PyTorch Lightning dataloader by the `TimeSeriesDataLoader`.                                                                                                                                                                       | <code>None</code>                                                                                                                                |
| `**trainer_kwargs`                     | <code>[int](#int)</code>                                                        | keyword trainer arguments inherited from [PyTorch Lighning's trainer](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.trainer.trainer.Trainer.html?highlight=trainer).                                                                                | <code>{}</code>                                                                                                                                  |

<details class="references" open markdown="1">
  <summary>References</summary>

  * [David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski (2020). "DeepAR: Probabilistic forecasting with autoregressive recurrent networks". International Journal of Forecasting.](https://www.sciencedirect.com/science/article/pii/S0169207019301888)
  * [Alexander Alexandrov et. al (2020). "GluonTS: Probabilistic and Neural Time Series Modeling in Python". Journal of Machine Learning Research.](https://www.jmlr.org/papers/v21/19-820.html)
</details>

#### `DeepAR.fit`

```python theme={null}
fit(
    dataset, val_size=0, test_size=0, random_seed=None, distributed_config=None
)
```

Fit.

The `fit` method, optimizes the neural network's weights using the
initialization parameters (`learning_rate`, `windows_batch_size`, ...)
and the `loss` function as defined during the initialization.
Within `fit` we use a PyTorch Lightning `Trainer` that
inherits the initialization's `self.trainer_kwargs`, to customize
its inputs, see [PL's trainer arguments](https://pytorch-lightning.readthedocs.io/en/stable/api/pytorch_lightning.trainer.trainer.Trainer.html?highlight=trainer).

The method is designed to be compatible with SKLearn-like classes
and in particular to be compatible with the StatsForecast library.

By default the `model` is not saving training checkpoints to protect
disk memory, to get them change `enable_checkpointing=True` in `__init__`.

**Parameters:**

| Name          | Type                                                 | Description                                                                            | Default           |
| ------------- | ---------------------------------------------------- | -------------------------------------------------------------------------------------- | ----------------- |
| `dataset`     | <code>[TimeSeriesDataset](#TimeSeriesDataset)</code> | NeuralForecast's `TimeSeriesDataset`, see [documentation](./tsdataset.html).           | *required*        |
| `val_size`    | <code>[int](#int)</code>                             | Validation size for temporal cross-validation.                                         | <code>0</code>    |
| `random_seed` | <code>[int](#int)</code>                             | Random seed for pytorch initializer and numpy generators, overwrites model.**init**'s. | <code>None</code> |
| `test_size`   | <code>[int](#int)</code>                             | Test size for temporal cross-validation.                                               | <code>0</code>    |

**Returns:**

| Type | Description |
| ---- | ----------- |
| None |             |

#### `DeepAR.predict`

```python theme={null}
predict(
    dataset,
    test_size=None,
    step_size=1,
    random_seed=None,
    quantiles=None,
    h=None,
    explainer_config=None,
    **data_module_kwargs
)
```

Predict.

Neural network prediction with PL's `Trainer` execution of `predict_step`.

**Parameters:**

| Name                   | Type                                                 | Description                                                                                                                                            | Default           |
| ---------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------- |
| `dataset`              | <code>[TimeSeriesDataset](#TimeSeriesDataset)</code> | NeuralForecast's `TimeSeriesDataset`, see [documentation](./tsdataset.html).                                                                           | *required*        |
| `test_size`            | <code>[int](#int)</code>                             | Test size for temporal cross-validation.                                                                                                               | <code>None</code> |
| `step_size`            | <code>[int](#int)</code>                             | Step size between each window.                                                                                                                         | <code>1</code>    |
| `random_seed`          | <code>[int](#int)</code>                             | Random seed for pytorch initializer and numpy generators, overwrites model.**init**'s.                                                                 | <code>None</code> |
| `quantiles`            | <code>[list](#list)</code>                           | Target quantiles to predict.                                                                                                                           | <code>None</code> |
| `h`                    | <code>[int](#int)</code>                             | Prediction horizon, if None, uses the model's fitted horizon. Defaults to None.                                                                        | <code>None</code> |
| `explainer_config`     | <code>[dict](#dict)</code>                           | configuration for explanations.                                                                                                                        | <code>None</code> |
| `**data_module_kwargs` | <code>[dict](#dict)</code>                           | PL's TimeSeriesDataModule args, see [documentation](https://pytorch-lightning.readthedocs.io/en/1.6.1/extensions/datamodules.html#using-a-datamodule). | <code>{}</code>   |

**Returns:**

| Type | Description |
| ---- | ----------- |
| None |             |

### Usage Example

```python theme={null}
import pandas as pd
import matplotlib.pyplot as plt

from neuralforecast import NeuralForecast
from neuralforecast.models import DeepAR
from neuralforecast.losses.pytorch import DistributionLoss, MQLoss
from neuralforecast.utils import AirPassengersPanel, AirPassengersStatic
Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]] # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test

nf = NeuralForecast(
    models=[DeepAR(h=12,
                   input_size=24,
                   lstm_n_layers=1,
                   trajectory_samples=100,
                   loss=DistributionLoss(distribution='StudentT', level=[80, 90], return_params=True),
                   valid_loss=MQLoss(level=[80, 90]),
                   learning_rate=0.005,
                   stat_exog_list=['airline1'],
                   futr_exog_list=['trend'],
                   max_steps=100,
                   val_check_steps=10,
                   early_stop_patience_steps=-1,
                   scaler_type='standard',
                   enable_progress_bar=True,
                   ),
    ],
    freq='ME'
)
nf.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
Y_hat_df = nf.predict(futr_df=Y_test_df)

# Plot quantile predictions
Y_hat_df = Y_hat_df.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])

plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['DeepAR-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:], 
                 y1=plot_df['DeepAR-lo-90'][-12:].values, 
                 y2=plot_df['DeepAR-hi-90'][-12:].values,
                 alpha=0.4, label='level 90')
plt.legend()
plt.grid()
plt.plot()
```

## 2. Auxiliary functions