Dask
Run StatsForecast distributedly on top of Dask.
StatsForecast works on top of Spark, Dask, and Ray through Fugue. StatsForecast will read the input DataFrame and use the corresponding engine. For example, if the input is a Spark DataFrame, StatsForecast will use the existing Spark session to run the forecast.
Installation
As long as Dask is installed and configured, StatsForecast will be able
to use it. If executing on a distributed Dask cluster, make use the
statsforecast
library is installed across all the workers.
StatsForecast on Pandas
Before running on Dask, it’s recommended to test on a smaller Pandas dataset to make sure everything is working. This example also helps show the small differences when using Dask.
ds | AutoETS | |
---|---|---|
unique_id | ||
0 | 2000-08-10 | 5.261609 |
0 | 2000-08-11 | 6.196357 |
0 | 2000-08-12 | 0.282309 |
0 | 2000-08-13 | 1.264195 |
0 | 2000-08-14 | 2.262453 |
Executing on Dask
To run the forecasts distributed on Dask, just pass in a Dask DataFrame
instead. Instead of having the unique_id
as an index, it needs to be a
column because Dask handles the index differently.
unique_id | ds | AutoETS | |
---|---|---|---|
0 | 0 | 2000-08-10 | 5.261609 |
1 | 0 | 2000-08-11 | 6.196357 |
2 | 0 | 2000-08-12 | 0.282309 |
3 | 0 | 2000-08-13 | 1.264195 |
4 | 0 | 2000-08-14 | 2.262453 |