Wrapper of xgboost.spark.SparkXGBRegressor that adds an extract_local_model method to get a local version of the trained model and broadcast it to the workers.

/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/fastcore/docscrape.py:230: UserWarning: Unknown section Note
  else: warn(msg)

source

SparkXGBForecast

 SparkXGBForecast (features_col:Union[str,List[str]]='features',
                   label_col:str='label', prediction_col:str='prediction',
                   pred_contrib_col:Optional[str]=None,
                   validation_indicator_col:Optional[str]=None,
                   weight_col:Optional[str]=None,
                   base_margin_col:Optional[str]=None, num_workers:int=1,
                   use_gpu:Optional[bool]=None, device:Optional[str]=None,
                   force_repartition:bool=False,
                   repartition_random_shuffle:bool=False,
                   enable_sparse_data_optim:bool=False, **kwargs:Any)

*SparkXGBRegressor is a PySpark ML estimator. It implements the XGBoost regression algorithm based on XGBoost python library, and it can be used in PySpark Pipeline and PySpark ML meta algorithms like - :py:class:~pyspark.ml.tuning.CrossValidator/ - :py:class:~pyspark.ml.tuning.TrainValidationSplit/ - :py:class:~pyspark.ml.classification.OneVsRest

SparkXGBRegressor automatically supports most of the parameters in :py:class:xgboost.XGBRegressor constructor and most of the parameters used in :py:meth:xgboost.XGBRegressor.fit and :py:meth:xgboost.XGBRegressor.predict method.

To enable GPU support, set device to cuda or gpu.

SparkXGBRegressor doesn’t support setting base_margin explicitly as well, but support another param called base_margin_col. see doc below for more details.

SparkXGBRegressor doesn’t support validate_features and output_margin param.

SparkXGBRegressor doesn’t support setting nthread xgboost param, instead, the nthread param for each xgboost worker will be set equal to spark.task.cpus config value.*