Distributed Forecast

DistributedMLForecast

 DistributedMLForecast (models, freq:Union[int,str],
                        lags:Optional[Iterable[int]]=None, lag_transforms:
                        Optional[Dict[int,List[Union[Callable,Tuple[Callab
                        le,Any]]]]]=None, date_features:Optional[Iterable[
                        Union[str,Callable]]]=None, num_threads:int=1, tar
                        get_transforms:Optional[List[Union[mlforecast.targ
                        et_transforms.BaseTargetTransform,mlforecast.targe
                        t_transforms._BaseGroupedArrayTargetTransform]]]=N
                        one, engine=None,
                        num_partitions:Optional[int]=None,
                        lag_transforms_namer:Optional[Callable]=None)

Multi backend distributed pipeline

source

DistributedMLForecast.fit

 DistributedMLForecast.fit (df:~AnyDataFrame, id_col:str='unique_id',
                            time_col:str='ds', target_col:str='y',
                            static_features:Optional[List[str]]=None,
                            dropna:bool=True,
                            keep_last_n:Optional[int]=None)

Apply the feature engineering and train the models.

	Type	Default	Details
df	AnyDataFrame		Series data in long format.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
target_col	str	y	Column that contains the target.
static_features	Optional	None	Names of the features that are static and will be repeated when forecasting.
dropna	bool	True	Drop rows with missing values produced by the transformations.
keep_last_n	Optional	None	Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it.
Returns	DistributedMLForecast		Forecast object with series values and trained models.

source

DistributedMLForecast.predict

 DistributedMLForecast.predict (h:int,
                                before_predict_callback:Optional[Callable]
                                =None, after_predict_callback:Optional[Cal
                                lable]=None, X_df:Optional[pandas.core.fra
                                me.DataFrame]=None,
                                new_df:Optional[~AnyDataFrame]=None,
                                ids:Optional[List[str]]=None)

Compute the predictions for the next horizon steps.

	Type	Default	Details
h	int		Forecast horizon.
before_predict_callback	Optional	None	Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index.
after_predict_callback	Optional	None	Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index.
X_df	Optional	None	Dataframe with the future exogenous features. Should have the id column and the time column.
new_df	Optional	None	Series data of new observations for which forecasts are to be generated. This dataframe should have the same structure as the one used to fit the model, including any features and time series data. If `new_df` is not None, the method will generate forecasts for the new observations.
ids	Optional	None	List with subset of ids seen during training for which the forecasts should be computed.
Returns	AnyDataFrame		Predictions for each serie and timestep, with one column per model.

source

DistributedMLForecast.save

 DistributedMLForecast.save (path:str)

Save forecast object

	Type	Details
path	str	Directory where artifacts will be stored.
Returns	None

source

DistributedMLForecast.load

 DistributedMLForecast.load (path:str, engine)

Load forecast object

	Type	Details
path	str	Directory with saved artifacts.
engine	fugue execution engine	Dask Client, Spark Session, etc to use for the distributed computation.
Returns	DistributedMLForecast

source

DistributedMLForecast.update

 DistributedMLForecast.update (df:pandas.core.frame.DataFrame)

Update the values of the stored series.

	Type	Details
df	DataFrame	Dataframe with new observations.
Returns	None

source

DistributedMLForecast.to_local

 DistributedMLForecast.to_local ()

*Convert this distributed forecast object into a local one This pulls all the data from the remote machines, so you have to be sure that it fits in the scheduler/driver. If you’re not sure use the save method instead.*

source

DistributedMLForecast.preprocess

 DistributedMLForecast.preprocess (df:~AnyDataFrame,
                                   id_col:str='unique_id',
                                   time_col:str='ds', target_col:str='y', 
                                   static_features:Optional[List[str]]=Non
                                   e, dropna:bool=True,
                                   keep_last_n:Optional[int]=None)

Add the features to data.

	Type	Default	Details
df	AnyDataFrame		Series data in long format.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
target_col	str	y	Column that contains the target.
static_features	Optional	None	Names of the features that are static and will be repeated when forecasting.
dropna	bool	True	Drop rows with missing values produced by the transformations.
keep_last_n	Optional	None	Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it.
Returns	AnyDataFrame		`df` with added features.

source

DistributedMLForecast.cross_validation

 DistributedMLForecast.cross_validation (df:~AnyDataFrame, n_windows:int,
                                         h:int, id_col:str='unique_id',
                                         time_col:str='ds',
                                         target_col:str='y',
                                         step_size:Optional[int]=None, sta
                                         tic_features:Optional[List[str]]=
                                         None, dropna:bool=True,
                                         keep_last_n:Optional[int]=None,
                                         refit:bool=True, before_predict_c
                                         allback:Optional[Callable]=None, 
                                         after_predict_callback:Optional[C
                                         allable]=None,
                                         input_size:Optional[int]=None)

Perform time series cross validation. Creates n_windows splits where each window has h test periods, trains the models, computes the predictions and merges the actuals.

	Type	Default	Details
df	AnyDataFrame		Series data in long format.
n_windows	int		Number of windows to evaluate.
h	int		Number of test periods in each window.
id_col	str	unique_id	Column that identifies each serie.
time_col	str	ds	Column that identifies each timestep, its values can be timestamps or integers.
target_col	str	y	Column that contains the target.
step_size	Optional	None	Step size between each cross validation window. If None it will be equal to `h`.
static_features	Optional	None	Names of the features that are static and will be repeated when forecasting.
dropna	bool	True	Drop rows with missing values produced by the transformations.
keep_last_n	Optional	None	Keep only these many records from each serie for the forecasting step. Can save time and memory if your features allow it.
refit	bool	True	Retrain model for each cross validation window. If False, the models are trained at the beginning and then used to predict each window.
before_predict_callback	Optional	None	Function to call on the features before computing the predictions. This function will take the input dataframe that will be passed to the model for predicting and should return a dataframe with the same structure. The series identifier is on the index.
after_predict_callback	Optional	None	Function to call on the predictions before updating the targets. This function will take a pandas Series with the predictions and should return another one with the same structure. The series identifier is on the index.
input_size	Optional	None	Maximum training samples per serie in each window. If None, will use an expanding window.
Returns	AnyDataFrame		Predictions for each window with the series id, timestamp, target value and predictions from each model.

Getting Started

How-to guides

Tutorials

API Reference

Distributed Forecast

DistributedMLForecast

DistributedMLForecast.fit

DistributedMLForecast.predict

DistributedMLForecast.save

DistributedMLForecast.load

DistributedMLForecast.update

DistributedMLForecast.to_local

DistributedMLForecast.preprocess

DistributedMLForecast.cross_validation

Getting Started

How-to guides

Tutorials

API Reference

​DistributedMLForecast

​DistributedMLForecast.fit

​DistributedMLForecast.predict

​DistributedMLForecast.save

​DistributedMLForecast.load

​DistributedMLForecast.update

​DistributedMLForecast.to_local

​DistributedMLForecast.preprocess

​DistributedMLForecast.cross_validation

DistributedMLForecast

DistributedMLForecast.fit

DistributedMLForecast.predict

DistributedMLForecast.save

DistributedMLForecast.load

DistributedMLForecast.update

DistributedMLForecast.to_local

DistributedMLForecast.preprocess

DistributedMLForecast.cross_validation