Documentation Index
Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
Use this file to discover all available pages before exploring further.
Run StatsForecast distributedly on top of Spark.
StatsForecast works on top of Spark, Dask, and Ray through
Fugue. StatsForecast will
read the input DataFrame and use the corresponding engine. For example,
if the input is a Spark DataFrame, StatsForecast will use the existing
Spark session to run the forecast.
A benchmark (with older syntax) can be found
here
where we forecasted one million timeseries in under 15 minutes.
Installation
As long as Spark is installed and configured, StatsForecast will be able
to use it. If executing on a distributed Spark cluster, make use the
statsforecast library is installed across all the workers.
StatsForecast on Pandas
Before running on Spark, it’s recommended to test on a smaller Pandas
dataset to make sure everything is working. This example also helps show
the small differences when using Spark.
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS
from statsforecast.utils import generate_series
n_series = 4
horizon = 7
series = generate_series(n_series)
sf = StatsForecast(
models=[AutoETS(season_length=7)],
freq='D',
)
sf.forecast(df=series, h=horizon).head()
| unique_id | ds | AutoETS |
|---|
| 0 | 0 | 2000-08-10 | 5.261609 |
| 1 | 0 | 2000-08-11 | 6.196357 |
| 2 | 0 | 2000-08-12 | 0.282309 |
| 3 | 0 | 2000-08-13 | 1.264195 |
| 4 | 0 | 2000-08-14 | 2.262453 |
Executing on Spark
To run the forecasts distributed on Spark, just pass in a Spark
DataFrame instead.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
series['unique_id'] = series['unique_id'].astype(str)
# Convert to Spark
sdf = spark.createDataFrame(series)
# Returns a Spark DataFrame
sf.forecast(df=sdf, h=horizon, level=[90]).show(5)
+---------+-------------------+----------+-------------+-------------+
|unique_id| ds| AutoETS|AutoETS-lo-90|AutoETS-hi-90|
+---------+-------------------+----------+-------------+-------------+
| 0|2000-08-10 00:00:00| 5.261609| 5.0255513| 5.4976664|
| 0|2000-08-11 00:00:00| 6.1963573| 5.9603| 6.432415|
| 0|2000-08-12 00:00:00|0.28230855| 0.04625102| 0.5183661|
| 0|2000-08-13 00:00:00| 1.2641948| 1.0281373| 1.5002524|
| 0|2000-08-14 00:00:00| 2.2624528| 2.0263953| 2.4985104|
+---------+-------------------+----------+-------------+-------------+
only showing top 5 rows