Run StatsForecast distributedly on top of Ray.StatsForecast works on top of Spark, Dask, and Ray through Fugue. StatsForecast will read the input DataFrame and use the corresponding engine. For example, if the input is a Ray Dataset, StatsForecast will use the existing Ray instance to run the forecast. A benchmark (with older syntax) can be found here where we forecasted one million timeseries in under half an hour.
Installation
As long as Ray is installed and configured, StatsForecast will be able to use it. If executing on a distributed Ray cluster, make use thestatsforecast library is installed across all the workers.
StatsForecast on Pandas
Before running on Ray, it’s recommended to test on a smaller Pandas dataset to make sure everything is working. This example also helps show the small differences when using Ray.| unique_id | ds | AutoETS | |
|---|---|---|---|
| 0 | 0 | 2000-08-10 | 5.261609 |
| 1 | 0 | 2000-08-11 | 6.196357 |
| 2 | 0 | 2000-08-12 | 0.282309 |
| 3 | 0 | 2000-08-13 | 1.264195 |
| 4 | 0 | 2000-08-14 | 2.262453 |

