Prerequesites

This Guide assumes basic familiarity with NeuralForecast. For a minimal example visit the Quick Start

Follow this article for a step to step guide on building a production-ready forecasting pipeline for multiple time series.

During this guide you will gain familiarity with the core NeuralForecastclass and some relevant methods like NeuralForecast.fit, NeuralForecast.predict, and StatsForecast.cross_validation.

We will use a classical benchmarking dataset from the M4 competition. The dataset includes time series from different domains like finance, economy and sales. In this example, we will use a subset of the Hourly dataset.

We will model each time series globally Therefore, you will train a set of models for the whole dataset, and then select the best model for each individual time series. NeuralForecast focuses on speed, simplicity, and scalability, which makes it ideal for this task.

Outline:

  1. Install packages.
  2. Read the data.
  3. Explore the data.
  4. Train many models globally for the entire dataset.
  5. Evaluate the model’s performance using cross-validation.
  6. Select the best model for every unique time series.

Not Covered in this guide

Tip

You can use Colab to run this Notebook interactively

Warning

To reduce the computation time, it is recommended to use GPU. Using Colab, do not forget to activate it. Just go to Runtime>Change runtime type and select GPU as hardware accelerator.

1. Install libraries

We assume you have NeuralForecast already installed. Check this guide for instructions on how to install NeuralForecast.

! pip install neuralforecast

2. Read the data

We will use pandas to read the M4 Hourly data set stored in a parquet file for efficiency. You can use ordinary pandas operations to read your data in other formats likes .csv.

The input to NeuralForecast is always a data frame in long format with three columns: unique_id, ds and y:

  • The unique_id (string, int or category) represents an identifier for the series.

  • The ds (datestamp or int) column should be either an integer indexing time or a datestampe ideally like YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp.

  • The y (numeric) represents the measurement we wish to forecast. We will rename the

This data set already satisfies the requirement.

Depending on your internet connection, this step should take around 10 seconds.

import pandas as pd
Y_df = pd.read_parquet('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.parquet')
Y_df.head()
unique_iddsy
0H11605.0
1H12586.0
2H13586.0
3H14559.0
4H15511.0

This dataset contains 414 unique series with 900 observations on average. For this example and reproducibility’s sake, we will select only 10 unique IDs. Depending on your processing infrastructure feel free to select more or less series.

Note

Processing time is dependent on the available computing resources. Running this example with the complete dataset takes around 10 minutes in a c5d.24xlarge (96 cores) instance from AWS.

uids = Y_df['unique_id'].unique()[:10] # Select 10 ids to make the example faster
Y_df = Y_df.query('unique_id in @uids').reset_index(drop=True)

3. Explore Data with the plot_series function

Plot some series using the plot_series function from the utilsforecast library. This method prints 8 random series from the dataset and is useful for basic EDA.

Note

The plot_series function uses matplotlib as a default engine. You can change to plotly by setting engine="plotly".

from utilsforecast.plotting import plot_series
plot_series(Y_df)

4. Train multiple models for many series

NeuralForecast can train many models on many time series globally and efficiently.

import logging

import optuna
import ray.tune as tune
import torch

from neuralforecast import NeuralForecast
from neuralforecast.auto import AutoNHITS, AutoLSTM
from neuralforecast.losses.pytorch import MQLoss
optuna.logging.set_verbosity(optuna.logging.WARNING)
logging.getLogger('pytorch_lightning').setLevel(logging.ERROR)
torch.set_float32_matmul_precision('high')

Each Auto model contains a default search space that was extensively tested on multiple large-scale datasets. Additionally, users can define specific search spaces tailored for particular datasets and tasks.

First, we create a custom search space for the AutoNHITS and AutoLSTM models. Search spaces are specified with dictionaries, where keys corresponds to the model’s hyperparameter and the value is a Tune function to specify how the hyperparameter will be sampled. For example, use randint to sample integers uniformly, and choice to sample values of a list.

def config_nhits(trial):
    return {
        "input_size": trial.suggest_categorical(          # Length of input window
            "input_size", (48, 48*2, 48*3)                
        ),                                                
        "start_padding_enabled": True,                                          
        "n_blocks": 5 * [1],                              # Length of input window
        "mlp_units": 5 * [[64, 64]],                      # Length of input window
        "n_pool_kernel_size": trial.suggest_categorical(  # MaxPooling Kernel size
            "n_pool_kernel_size",
            (5*[1], 5*[2], 5*[4], [8, 4, 2, 1, 1])
        ),     
        "n_freq_downsample": trial.suggest_categorical(   # Interpolation expressivity ratios
            "n_freq_downsample",
            ([8, 4, 2, 1, 1],  [1, 1, 1, 1, 1])
        ),     
        "learning_rate": trial.suggest_float(             # Initial Learning rate
            "learning_rate",
            low=1e-4,
            high=1e-2,
            log=True,
        ),            
        "scaler_type": None,                              # Scaler type
        "max_steps": 1000,                                # Max number of training iterations
        "batch_size": trial.suggest_categorical(          # Number of series in batch
            "batch_size",
            (1, 4, 10),
        ),                   
        "windows_batch_size": trial.suggest_categorical(  # Number of windows in batch
            "windows_batch_size",
            (128, 256, 512),
        ),      
        "random_seed": trial.suggest_int(                 # Random seed   
            "random_seed",
            low=1,
            high=20,
        ),                      
    }

def config_lstm(trial):
    return {
        "input_size": trial.suggest_categorical(           # Length of input window
            "input_size",
            (48, 48*2, 48*3)
        ),   
        "encoder_hidden_size": trial.suggest_categorical(  # Hidden size of LSTM cells
            "encoder_hidden_size",
            (64, 128),
        ),  
        "encoder_n_layers": trial.suggest_categorical(     # Number of layers in LSTM
            "encoder_n_layers",
            (2,4),
        ),        
        "learning_rate": trial.suggest_float(              # Initial Learning rate
            "learning_rate",
            low=1e-4,
            high=1e-2,
            log=True,
        ),   
        "scaler_type": 'robust',                           # Scaler type
        "max_steps": trial.suggest_categorical(           # Max number of training iterations
            "max_steps",
            (500, 1000)
        ),          
        "batch_size": trial.suggest_categorical(           # Number of series in batch
            "batch_size",
            (1, 4)
        ),              
        "random_seed": trial.suggest_int(                  # Random seed
            "random_seed",
            low=1,
            high=20
        ),             
    }

To instantiate an Auto model you need to define:

  • h: forecasting horizon.
  • loss: training and validation loss from neuralforecast.losses.pytorch.
  • config: hyperparameter search space. If None, the Auto class will use a pre-defined suggested hyperparameter space.
  • search_alg: search algorithm
  • num_samples: number of configurations explored.

In this example we set horizon h as 48, use the MQLoss distribution loss for training and validation, and use the default search algorithm.

nf = NeuralForecast(
    models=[
        AutoNHITS(h=48, config=config_nhits, loss=MQLoss(), backend='optuna', num_samples=5),
        AutoLSTM(h=48, config=config_lstm, loss=MQLoss(), backend='optuna', num_samples=2),
    ],
    freq=1,
)

Tip

The number of samples, num_samples, is a crucial parameter! Larger values will usually produce better results as we explore more configurations in the search space, but it will increase training times. Larger search spaces will usually require more samples. As a general rule, we recommend setting num_samples higher than 20.

Next, we use the Neuralforecast class to train the Auto model. In this step, Auto models will automatically perform hyperparameter tuning training multiple models with different hyperparameters, producing the forecasts on the validation set, and evaluating them. The best configuration is selected based on the error on a validation set. Only the best model is stored and used during inference.

nf.fit(df=Y_df)
Seed set to 17
Seed set to 10
Seed set to 6
Seed set to 6
Seed set to 11
Seed set to 11
Seed set to 4
Seed set to 20
Seed set to 20

Next, we use the predict method to forecast the next 48 days using the optimal hyperparameters.

fcst_df = nf.predict()
fcst_df.columns = fcst_df.columns.str.replace('-median', '')
fcst_df.head()
Predicting: |                                                                                                 …
Predicting: |                                                                                                 …
unique_iddsAutoNHITSAutoNHITS-lo-90AutoNHITS-lo-80AutoNHITS-hi-80AutoNHITS-hi-90AutoLSTMAutoLSTM-lo-90AutoLSTM-lo-80AutoLSTM-hi-80AutoLSTM-hi-90
0H1749598.767883514.016663540.870117642.218750662.944946622.685608348.235046394.057556852.252502908.859009
1H1750528.400696385.861420424.128540608.396118634.586304624.733154347.051666393.736053851.637329908.909668
2H1751461.096741329.577850350.306732565.697876579.242432620.866211343.146179389.550720847.222046906.901062
3H1752403.170471295.051971308.777893495.823914541.085754625.414551347.815918391.334167851.686157910.480469
4H1753366.044464261.056061285.326935478.635132523.788086621.209900351.657196392.890259856.152710909.783447
plot_series(Y_df, fcst_df, plot_random=False, max_insample_length=48 * 3, level=[80, 90])

The plot_series function allows for further customization. For example, plot the results of the different models and unique ids.

# Plot to unique_ids and some selected models
plot_series(Y_df, fcst_df, models=["AutoLSTM"], ids=["H107", "H104"], level=[80, 90])

# Explore other models 
plot_series(Y_df, fcst_df, models=["AutoNHITS"], ids=["H10", "H105"], level=[80, 90])

5. Evaluate the model’s performance

In previous steps, we’ve taken our historical data to predict the future. However, to asses its accuracy we would also like to know how the model would have performed in the past. To assess the accuracy and robustness of your models on your data perform Cross-Validation.

With time series data, Cross Validation is done by defining a sliding window across the historical data and predicting the period following it. This form of cross-validation allows us to arrive at a better estimation of our model’s predictive abilities across a wider range of temporal instances while also keeping the data in the training set contiguous as is required by our models.

The following graph depicts such a Cross Validation Strategy:

Tip

Setting n_windows=1 mirrors a traditional train-test split with our historical data serving as the training set and the last 48 hours serving as the testing set.

The cross_validation method from the NeuralForecast class takes the following arguments.

  • df: training data frame

  • step_size (int): step size between each window. In other words: how often do you want to run the forecasting processes.

  • n_windows (int): number of windows used for cross validation. In other words: what number of forecasting processes in the past do you want to evaluate.

from neuralforecast.auto import AutoNHITS, AutoLSTM
config_nhits = {
    "input_size": tune.choice([48, 48*2, 48*3]),                    # Length of input window
    "start_padding_enabled": True,  
    "n_blocks": 5*[1],                                              # Length of input window
    "mlp_units": 5 * [[64, 64]],                                    # Length of input window
    "n_pool_kernel_size": tune.choice([5*[1], 5*[2], 5*[4],         
                                      [8, 4, 2, 1, 1]]),            # MaxPooling Kernel size
    "n_freq_downsample": tune.choice([[8, 4, 2, 1, 1],
                                      [1, 1, 1, 1, 1]]),            # Interpolation expressivity ratios
    "learning_rate": tune.loguniform(1e-4, 1e-2),                   # Initial Learning rate
    "scaler_type": tune.choice([None]),                             # Scaler type
    "max_steps": tune.choice([1000]),                               # Max number of training iterations
    "batch_size": tune.choice([1, 4, 10]),                          # Number of series in batch
    "windows_batch_size": tune.choice([128, 256, 512]),             # Number of windows in batch
    "random_seed": tune.randint(1, 20),                             # Random seed
}

config_lstm = {
    "input_size": tune.choice([48, 48*2, 48*3]),              # Length of input window
    "encoder_hidden_size": tune.choice([64, 128]),            # Hidden size of LSTM cells
    "encoder_n_layers": tune.choice([2,4]),                   # Number of layers in LSTM
    "learning_rate": tune.loguniform(1e-4, 1e-2),             # Initial Learning rate
    "scaler_type": tune.choice(['robust']),                   # Scaler type
    "max_steps": tune.choice([500, 1000]),                    # Max number of training iterations
    "batch_size": tune.choice([1, 4]),                        # Number of series in batch
    "random_seed": tune.randint(1, 20),                       # Random seed
}
nf = NeuralForecast(
    models=[
        AutoNHITS(h=48, config=config_nhits, loss=MQLoss(), num_samples=5),
        AutoLSTM(h=48, config=config_lstm, loss=MQLoss(), num_samples=2), 
    ],
    freq=1,
)
cv_df = nf.cross_validation(Y_df, n_windows=2)

The cv_df object is a new data frame that includes the following columns:

  • unique_id: identifies each time series
  • ds: datestamp or temporal index
  • cutoff: the last datestamp or temporal index for the n_windows. If n_windows=1, then one unique cuttoff value, if n_windows=2 then two unique cutoff values.
  • y: true value
  • "model": columns with the model’s name and fitted value.
cv_df.columns = cv_df.columns.str.replace('-median', '')
cv_df.head()
unique_iddscutoffAutoNHITSAutoNHITS-lo-90AutoNHITS-lo-80AutoNHITS-hi-80AutoNHITS-hi-90AutoLSTMAutoLSTM-lo-90AutoLSTM-lo-80AutoLSTM-hi-80AutoLSTM-hi-90y
0H1700699681.888367636.549561643.658691702.102844711.914185682.982483614.895813644.472107746.966064760.720764684.0
1H1701699653.331055603.969421616.532288676.536743685.295105623.179443524.802490553.547241712.007690738.519165619.0
2H1702699586.374207534.213074555.465637620.347168629.695801555.048889448.712463473.406036656.280823697.658386565.0
3H1703699534.878662482.983459501.141296571.394897582.617981497.977661376.078644403.215302623.380188670.032471532.0
4H1704699508.891663454.501068461.194580536.077026547.211914457.796600327.437347352.698944588.427917645.391785495.0
from IPython.display import display
for cutoff in cv_df['cutoff'].unique():
    display(
        plot_series(
            Y_df,
            cv_df.query('cutoff == @cutoff').drop(columns=['y', 'cutoff']),
            max_insample_length=48 * 4, 
            ids=['H102'],
        )
    )

Now, let’s evaluate the models’ performance.

from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mse, mae, rmse

Warning

You can also use Mean Average Percentage Error (MAPE), however for granular forecasts, MAPE values are extremely hard to judge and not useful to assess forecasting quality.

Create the data frame with the results of the evaluation of your cross-validation data frame using a Mean Squared Error metric.

evaluation_df = evaluate(cv_df.drop(columns='cutoff'), metrics=[mse, mae, rmse])
evaluation_df['best_model'] = evaluation_df.drop(columns=['metric', 'unique_id']).idxmin(axis=1)
evaluation_df.head()
unique_idmetricAutoNHITSAutoLSTMbest_model
0H1mse1194.9383152078.551001AutoNHITS
1H10mse177.231943221.795192AutoNHITS
2H100mse75231.39764634166.798331AutoLSTM
3H101mse61851.71197213609.076270AutoLSTM
4H102mse32519.02190686900.750591AutoNHITS

Create a summary table with a model column and the number of series where that model performs best.

summary_df = evaluation_df.groupby(['metric', 'best_model']).size().sort_values().to_frame()
summary_df = summary_df.reset_index()
summary_df.columns = ['metric', 'model', 'nr. of unique_ids']
summary_df
metricmodelnr. of unique_ids
0maeAutoNHITS4
1mseAutoNHITS4
2rmseAutoNHITS4
3maeAutoLSTM6
4mseAutoLSTM6
5rmseAutoLSTM6
summary_df.query('metric == "mse"')
metricmodelnr. of unique_ids
1mseAutoNHITS4
4mseAutoLSTM6

You can further explore your results by plotting the unique_ids where a specific model wins.

nhits_ids = evaluation_df.query('best_model == "AutoNHITS" and metric == "mse"')['unique_id'].unique()

plot_series(Y_df, fcst_df, ids=nhits_ids)

6. Select the best model for every unique series

Define a utility function that takes your forecast’s data frame with the predictions and the evaluation data frame and returns a data frame with the best possible forecast for every unique_id.

def get_best_model_forecast(forecasts_df, evaluation_df, metric):
    metric_eval = evaluation_df.loc[evaluation_df['metric'] == metric, ['unique_id', 'best_model']]
    with_best = forecasts_df.merge(metric_eval)
    res = with_best[['unique_id', 'ds']].copy()
    for suffix in ('', '-lo-90', '-hi-90'):
        res[f'best_model{suffix}'] = with_best.apply(lambda row: row[row['best_model'] + suffix], axis=1)
    return res

Create your production-ready data frame with the best forecast for every unique_id.

prod_forecasts_df = get_best_model_forecast(fcst_df, evaluation_df, metric='mse')
prod_forecasts_df
unique_iddsbest_modelbest_model-lo-90best_model-hi-90
0H1749598.767883514.016663662.944946
1H1750528.400696385.861420634.586304
2H1751461.096741329.577850579.242432
3H1752403.170471295.051971541.085754
4H1753366.044464261.056061523.788086
475H1077923559.9575202496.5698244672.525391
476H1077933557.7741702493.1425784671.074219
477H1077943555.8618162487.2656254681.086914
478H1077953560.6362302487.0654304678.455078
479H1077963558.5231932489.4946294695.519531

Plot the results.

plot_series(Y_df, prod_forecasts_df, level=[90])