In this notebook, we compare the performance of TimeGPT against three forecasting models: the classical model (ARIMA), the machine learning model (LightGBM), and the deep learning model (N-HiTS), using a subset of data from the M5 Forecasting competition. We want to highlight three top-rated benefits our users love about TimeGPT:

🎯 Accuracy: TimeGPT consistently outperforms traditional models by capturing complex patterns with precision.

⚑ Speed: Generate forecasts faster without needing extensive training or tuning for each series.

πŸš€ Ease of Use: Minimal setup and no complex preprocessing make TimeGPT accessible and ready to use right out of the box!

Before diving into the notebook, please visit our dashboard to generate your TimeGPT api_key and give it a try yourself!

Table of Contents

  1. Data Introduction
  2. Model Fitting
    1. Fitting TimeGPT
    2. Fitting ARIMA
    3. Fitting Light GBM
    4. Fitting NHITS
  3. Results and Evaluation
  4. Conclusion

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from nixtla import NixtlaClient
from utilsforecast.plotting import plot_series
from utilsforecast.losses import mae, rmse, smape
from utilsforecast.evaluation import evaluate
nixtla_client = NixtlaClient(
    # api_key = 'my_api_key_provided_by_nixtla'
)

1. Data introduction

In this notebook, we’re working with an aggregated dataset from the M5 Forecasting - Accuracy competition. This dataset includes 7 daily time series, each with 1,941 data points. The last 28 data points of each series are set aside as the test set, allowing us to evaluate model performance on unseen data.

df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/demand_example.csv', parse_dates=['ds'])
df.groupby('unique_id').agg({"ds":["min","max","count"],\
                             "y":["min","mean","median","max"]})
dsy
minmaxcountminmeanmedianmax
unique_id
FOODS_12011-01-292016-05-2219410.02674.0855232665.05493.0
FOODS_22011-01-292016-05-2219410.04015.9840293894.09069.0
FOODS_32011-01-292016-05-22194110.016969.08912916548.028663.0
HOBBIES_12011-01-292016-05-2219410.02936.1226172908.05009.0
HOBBIES_22011-01-292016-05-2219410.0279.053065248.0871.0
HOUSEHOLD_12011-01-292016-05-2219410.06039.5945395984.011106.0
HOUSEHOLD_22011-01-292016-05-2219410.01566.8402891520.02926.0
df_train = df.query('ds <= "2016-04-24"')
df_test = df.query('ds > "2016-04-24"')

print(df_train.shape, df_test.shape)
(13391, 3) (196, 3)

2. Model Fitting (TimeGPT, ARIMA, LightGBM, N-HiTS)

2.1 TimeGPT

TimeGPT offers a powerful, streamlined solution for time series forecasting, delivering state-of-the-art results with minimal effort. With TimeGPT, there’s no need for data preprocessing or feature engineering – simply initiate the Nixtla client and call nixtla_client.forecast to produce accurate, high-performance forecasts tailored to your unique time series.

# Forecast with TimeGPT
fcst_timegpt = nixtla_client.forecast(df = df_train,
                       target_col = 'y', 
                       h=28,                              # Forecast horizon, predicts the next 28 time steps
                       model='timegpt-1-long-horizon',    # Use the model for long-horizon forecasting
                       finetune_steps=10,                 # Number of finetuning steps
                       level = [90])                      # Generate a 90% confidence interval
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: D
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
# Evaluate performance and plot forecast
fcst_timegpt['ds'] = pd.to_datetime(fcst_timegpt['ds'])
test_df = pd.merge(df_test, fcst_timegpt, 'left', ['unique_id', 'ds'])
evaluation_timegpt = evaluate(test_df, metrics=[rmse, smape], models=["TimeGPT"])
evaluation_timegpt.groupby(['metric'])['TimeGPT'].mean()
metric
rmse     592.607378
smape      0.049403
Name: TimeGPT, dtype: float64

2.2 Classical Models (ARIMA):

Next, we applied ARIMA, a traditional statistical model, to the same forecasting task. Classical models use historical trends and seasonality to make predictions by relying on linear assumptions. However, they struggled to capture the complex, non-linear patterns within the data, leading to lower accuracy compared to other approaches. Additionally, ARIMA was slower due to its iterative parameter estimation process, which becomes computationally intensive for larger datasets.

πŸ“˜ Why Use TimeGPT over Classical Models?

  • Complex Patterns: TimeGPT captures non-linear trends classical models miss.

  • Minimal Preprocessing: TimeGPT requires little to no data preparation.

  • Scalability: TimeGPT can efficiently scales across multiple series without retraining.

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
#Initiate ARIMA model
sf = StatsForecast(
    models=[AutoARIMA(season_length=7)],
    freq='D'
)
# Fit and forecast
fcst_arima = sf.forecast(h=28, df=df_train)
fcst_arima.reset_index(inplace=True)
test_df = pd.merge(df_test, fcst_arima, 'left', ['unique_id', 'ds'])
evaluation_arima = evaluate(test_df, metrics=[rmse, smape], models=["AutoARIMA"])
evaluation_arima.groupby(['metric'])['AutoARIMA'].mean()
metric
rmse     724.957364
smape      0.055018
Name: AutoARIMA, dtype: float64

2.3 Machine Learning Models (LightGBM)

Thirdly, we used a machine learning model, LightGBM, for the same forecasting task, implemented through the automated pipeline provided by our mlforecast library. While LightGBM can capture seasonality and patterns, achieving the best performance often requires detailed feature engineering, careful hyperparameter tuning, and domain knowledge. You can try our mlforecast library to simplify this process and get started quickly!

πŸ“˜ Why Use TimeGPT over Machine Learning Models?

  • Automatic Pattern Recognition: Captures complex patterns from raw data, bypassing the need for feature engineering.

  • Minimal Tuning: Works well without extensive tuning.

  • Scalability: Forecasts across multiple series without retraining.

import optuna
from mlforecast.auto import AutoMLForecast, AutoLightGBM

# Suppress Optuna's logging output
optuna.logging.set_verbosity(optuna.logging.ERROR)
# Initialize an automated forecasting pipeline using AutoMLForecast.
mlf = AutoMLForecast(
    models=[AutoLightGBM()],
    freq='D',
    season_length=7,            
    fit_config=lambda trial: {'static_features': ['unique_id']}
)

# Fit the model to the training dataset.
mlf.fit(
    df=df_train.astype({'unique_id': 'category'}),
    n_windows=1,
    h=28,
    num_samples=10,
)
fcst_lgbm = mlf.predict(28)
test_df = pd.merge(df_test, fcst_lgbm, 'left', ['unique_id', 'ds'])
evaluation_lgbm = evaluate(test_df, metrics=[rmse, smape], models=["AutoLightGBM"])
evaluation_lgbm.groupby(['metric'])['AutoLightGBM'].mean()
metric
rmse     687.773744
smape      0.051448
Name: AutoLightGBM, dtype: float64

2.4 N-HiTS

Lastly, we used N-HiTS, a state-of-the-art deep learning model designed for time series forecasting. The model produced accurate results, demonstrating its ability to capture complex, non-linear patterns within the data. However, setting up and tuning N-HiTS required significantly more time and computational resources compared to TimeGPT.

πŸ“˜ Why Use TimeGPT Over Deep Learning Models?

  • Faster Setup: Quick setup and forecasting, unlike the lengthy configuration and training times of neural networks.

  • Less Tuning: Performs well with minimal tuning and preprocessing, while neural networks often need extensive adjustments.

  • Ease of Use: Simple deployment with high accuracy, making it accessible without deep technical expertise.

from neuralforecast.core import NeuralForecast
from neuralforecast.models import NHITS
# Initialize the N-HiTS model.
models = [NHITS(h=28, 
                input_size=28, 
                max_steps=100)]

# Fit the model using training data
nf = NeuralForecast(models=models, freq='D')
nf.fit(df=df_train)
fcst_nhits = nf.predict()
test_df = pd.merge(df_test,fcst_nhits, 'left', ['unique_id', 'ds'])
evaluation_nhits = evaluate(test_df, metrics=[rmse, smape], models=["NHITS"])
evaluation_nhits.groupby(['metric'])['NHITS'].mean()
metric
rmse     605.011948
smape      0.053446
Name: NHITS, dtype: float64

3. Performance Comparison and Results:

The performance of each model is evaluated using RMSE (Root Mean Squared Error) and SMAPE (Symmetric Mean Absolute Percentage Error). While RMSE emphasizes the models’ ability to control significant errors, SMAPE provides a relative performance perspective by normalizing errors as percentages. Below, we present a snapshot of performance across all groups. The results demonstrate that TimeGPT outperforms other models on both metrics.

🌟 For a deeper dive into benchmarking, check out our benchmark repository. The summarized results are displayed below:

Overall Performance Metrics

ModelRMSESMAPE
ARIMA724.95.50%
LightGBM687.85.14%
N-HiTS605.05.34%
TimeGPT592.64.94%

Breakdown for Each Time-series

Followed below are the metrics for each individual time series groups. TimeGPT consistently delivers accurate forecasts across all time series groups. In many cases, it performs as well as or better than data-specific models, showing its versatility and reliability across different datasets.

Benchmark Results

For a more comprehensive dive into model accuracy and performance, explore our Time Series Model Arena! TimeGPT continues to lead the pack with exceptional performance across benchmarks! 🌟

4. Conclusion

At the end of this notebook, we’ve put together a handy table to show you exactly where TimeGPT shines brightest compared to other forecasting models. β˜€οΈ Think of it as your quick guide to choosing the best model for your unique project needs. We’re confident that TimeGPT will be a valuable tool in your forecasting journey. Don’t forget to visit our dashboard to generate your TimeGPT api_key and get started today! Happy forecasting, and enjoy the insights ahead!

Scenario

TimeGPT

Classical Models (e.g., ARIMA)

Machine Learning Models (e.g., XGB, LGBM)

Deep Learning Models (e.g., N-HITS)

Seasonal Patterns

βœ… Performs well with minimal setup

βœ… Handles seasonality with adjustments (e.g., SARIMA)

βœ… Performs well with feature engineering

βœ… Captures seasonal patterns effectively

Non-Linear Patterns

βœ… Excels, especially with complex non-linear patterns

❌ Limited performance

❌ Struggles without extensive feature engineering

βœ… Performs well with non-linear relationships

Large Dataset

βœ… Highly scalable across many series

❌ Slow and resource-intensive

βœ… Scalable with optimized implementations

❌ Requires significant resources for large datasets

Small Dataset

βœ… Performs well; requires only one data point to start

βœ… Performs well; may struggle with very sparse data

βœ… Performs adequately if enough features are extracted

❌ May need a minimum data size to learn effectively

Preprocessing Required

βœ… Minimal preprocessing needed

❌ Requires scaling, log-transform, etc., to meet model assumptions.

❌ Requires extensive feature engineering for complex patterns

❌ Needs data normalization and preprocessing

Accuracy Requirement

βœ… Achieves high accuracy with minimal tuning

❌ May struggle with complex accuracy requirements

βœ… Can achieve good accuracy with tuning

βœ… High accuracy possible but with significant resource use

Scalability

βœ… Highly scalable with minimal task-specific configuration

❌ Not easily scalable

βœ… Moderate scalability, with feature engineering and tuning per task

❌ Limited scalability due to resource demands

Computational Resources

βœ… Highly efficient, operates seamlessly on CPU, no GPU needed

βœ… Light to moderate, scales poorly with large datasets

❌ Moderate, depends on feature complexity

❌ High resource consumption, often requires GPU

Memory Requirement

βœ… Efficient memory usage for large datasets

βœ… Moderate memory requirements

❌ High memory usage for larger datasets or many series cases

❌ High memory consumption for larger datasets and multiple series

Technical Requirements & Domain Knowledge

βœ… Low; minimal technical setup and no domain expertise needed

βœ… Low to moderate; needs understanding of stationarity

❌ Moderate to high; requires feature engineering and tuning

❌ High; complex architecture and tuning