StatsForecast follows the sklearn model API. For this minimal example, you will create an instance of the StatsForecast class and then call its fit and predict methods. We recommend this option if speed is not paramount and you want to explore the fitted values and parameters.

Tip

If you want to forecast many series, we recommend using the forecast method. Check this Getting Started with multiple time series guide.

The input to StatsForecast is always a data frame in long format with three columns: unique_id, ds and y:

  • The unique_id (string, int or category) represents an identifier for the series.

  • The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp.

  • The y (numeric) represents the measurement we wish to forecast.

As an example, let’s look at the US Air Passengers dataset. This time series consists of monthly totals of a US airline passengers from 1949 to 1960. The CSV is available here.

We assume you have StatsForecast already installed. Check this guide for instructions on how to install StatsForecast.

First, we’ll import the data:

# uncomment the following line to install the library
# %pip install statsforecast
import pandas as pd
df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv', parse_dates=['ds'])
df.head()
unique_iddsy
0AirPassengers1949-01-01112
1AirPassengers1949-02-01118
2AirPassengers1949-03-01132
3AirPassengers1949-04-01129
4AirPassengers1949-05-01121

We fit the model by instantiating a new StatsForecast object with its two required parameters: https://nixtla.github.io/statsforecast/src/core/models.html * models: a list of models. Select the models you want from models and import them. For this example, we will use a AutoARIMA model. We set season_length to 12 because we expect seasonal effects every 12 months. (See: Seasonal periods)

Any settings are passed into the constructor. Then you call its fit method and pass in the historical data frame.

Note

StatsForecast achieves its blazing speed using JIT compiling through Numba. The first time you call the statsforecast class, the fit method should take around 5 seconds. The second time -once Numba compiled your settings- it should take less than 0.2s.

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
sf = StatsForecast(
    models=[AutoARIMA(season_length = 12)],
    freq='MS',
)
sf.fit(df)
StatsForecast(models=[AutoARIMA])

The predict method takes two arguments: forecasts the next h (for horizon) and level.

  • h (int): represents the forecast h steps into the future. In this case, 12 months ahead.

  • level (list of floats): this optional parameter is used for probabilistic forecasting. Set the level (or confidence percentile) of your prediction interval. For example, level=[90] means that the model expects the real value to be inside that interval 90% of the times.

The forecast object here is a new data frame that includes a column with the name of the model and the y hat values, as well as columns for the uncertainty intervals.

forecast_df = sf.predict(h=12, level=[90])
forecast_df.tail()
unique_iddsAutoARIMAAutoARIMA-lo-90AutoARIMA-hi-90
7AirPassengers1961-08-01633.236389590.009033676.463745
8AirPassengers1961-09-01535.236389489.558899580.913940
9AirPassengers1961-10-01488.236389440.233795536.239014
10AirPassengers1961-11-01417.236389367.016205467.456604
11AirPassengers1961-12-01459.236389406.892456511.580322

You can plot the forecast by calling the StatsForecast.plot method and passing in your forecast dataframe.

sf.plot(df, forecast_df, level=[90])

Next Steps