StatsForecast works on top of Spark, Dask, and Ray through Fugue. StatsForecast will read the input DataFrame and use the corresponding engine. For example, if the input is a Spark DataFrame, StatsForecast will use the existing Spark session to run the forecast.

Installation

As long as Dask is installed and configured, StatsForecast will be able to use it. If executing on a distributed Dask cluster, make use the statsforecast library is installed across all the workers.

StatsForecast on Pandas

Before running on Dask, it’s recommended to test on a smaller Pandas dataset to make sure everything is working. This example also helps show the small differences when using Dask.

from statsforecast.core import StatsForecast
from statsforecast.models import ( 
    AutoARIMA,
    AutoETS,
)
from statsforecast.utils import generate_series

n_series = 4
horizon = 7

series = generate_series(n_series)

sf = StatsForecast(
    models=[AutoETS(season_length=7)],
    freq='D',
)
sf.forecast(df=series, h=horizon).head()
dsAutoETS
unique_id
02000-08-105.261609
02000-08-116.196357
02000-08-120.282309
02000-08-131.264195
02000-08-142.262453

Executing on Dask

To run the forecasts distributed on Dask, just pass in a Dask DataFrame instead. Instead of having the unique_id as an index, it needs to be a column because Dask handles the index differently.

import dask.dataframe as dd

# Make unique_id a column
series = series.reset_index()
series['unique_id'] = series['unique_id'].astype(str)

ddf = dd.from_pandas(series, npartitions=4)
sf.forecast(df=ddf, h=horizon).compute().head()
unique_iddsAutoETS
002000-08-105.261609
102000-08-116.196357
202000-08-120.282309
302000-08-131.264195
402000-08-142.262453