FugueBackend
source
FugueBackend
*FugueBackend for Distributed Computation. Source code.
This class uses Fugue backend capable of distributing computation on Spark, Dask and Ray without any rewrites.*
Type | Default | Details | |
---|---|---|---|
engine | Any | None | A selection between Spark, Dask, and Ray. |
conf | Any | None | Engine configuration. |
transform_kwargs | Any |
source
FugueBackend.forecast
*Memory Efficient core.StatsForecast predictions with FugueBackend.
This method uses Fugue’s transform function, in combination with
core.StatsForecast
’s forecast to efficiently fit a list of
StatsForecast models.*
Type | Details | |
---|---|---|
df | AnyDataFrame | DataFrame with ids, times, targets and exogenous. |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. |
models | List | List of instantiated objects models.StatsForecast. |
fallback_model | Optional | Any, optional (default=None) Model to be used if a model fails. Only works with the forecast and cross_validation methods. |
X_df | Optional | DataFrame with ids, times and future exogenous. |
h | int | Forecast horizon. |
level | Optional | Confidence levels between 0 and 100 for prediction intervals. |
fitted | bool | Store in-sample predictions. |
prediction_intervals | Optional | Configuration to calibrate prediction intervals (Conformal Prediction). |
id_col | str | Column that identifies each serie. |
time_col | str | Column that identifies each timestep, its values can be timestamps or integers. |
target_col | str | Column that contains the target. |
Returns | Any | DataFrame with models columns for point predictions and probabilistic predictions for all fitted models |
source
FugueBackend.cross_validation
*Temporal Cross-Validation with core.StatsForecast and FugueBackend.
This method uses Fugue’s transform function, in combination with
core.StatsForecast
’s cross-validation to efficiently fit a list of
StatsForecast models through multiple training windows, in either
chained or rolled manner.
StatsForecast.models
’ speed along with Fugue’s distributed computation
allow to overcome this evaluation technique high computational costs.
Temporal cross-validation provides better model’s generalization
measurements by increasing the test’s length and diversity.*
Type | Details | |
---|---|---|
df | AnyDataFrame | DataFrame with ids, times, targets and exogenous. |
freq | Union | Frequency of the data. Must be a valid pandas or polars offset alias, or an integer. |
models | List | List of instantiated objects models.StatsForecast. |
fallback_model | Optional | Any, optional (default=None) Model to be used if a model fails. Only works with the forecast and cross_validation methods. |
h | int | Forecast horizon. |
n_windows | int | Number of windows used for cross validation. |
step_size | int | Step size between each window. |
test_size | int | Length of test size. If passed, set n_windows=None . |
input_size | int | Input size for each window, if not none rolled windows. |
level | Optional | Confidence levels between 0 and 100 for prediction intervals. |
refit | bool | Wether or not refit the model for each window. If int, train the models every refit windows. |
fitted | bool | Store in-sample predictions. |
prediction_intervals | Optional | Configuration to calibrate prediction intervals (Conformal Prediction). |
id_col | str | Column that identifies each serie. |
time_col | str | Column that identifies each timestep, its values can be timestamps or integers. |
target_col | str | Column that contains the target. |
Returns | Any | DataFrame, with models columns for point predictions and probabilistic predictions for all fitted models . |
Dask Distributed Predictions
Here we provide an example for the distribution of the
StatsForecast
predictions using Fugue
to execute the code in a Dask cluster.
To do it we instantiate the
FugueBackend
class with a DaskExecutionEngine
.
We have simply create the class to the usual
StatsForecast
instantiation.
Distributed Forecast
For extremely fast distributed predictions we use FugueBackend as backend that operates like the original StatsForecast.forecast method.
It receives as input a pandas.DataFrame with columns
[unique_id
,ds
,y
] and exogenous, where the ds
(datestamp)
column should be of a format expected by Pandas. The y
column must be
numeric, and represents the measurement we wish to forecast. And the
unique_id
uniquely identifies the series in the panel data.
Distributed Cross-Validation
For extremely fast distributed temporcal cross-validation we use
cross_validation
method that operates like the original
StatsForecast.cross_validation
method.