Wrapper of xgboost.spark.SparkXGBRegressor that adds an extract_local_model method to get a local version of the trained model and broadcast it to the workers.


source

SparkXGBForecast

 SparkXGBForecast (**kwargs)

*SparkXGBRegressor is a PySpark ML estimator. It implements the XGBoost regression algorithm based on XGBoost python library, and it can be used in PySpark Pipeline and PySpark ML meta algorithms like :py:class:~pyspark.ml.tuning.CrossValidator/ :py:class:~pyspark.ml.tuning.TrainValidationSplit/ :py:class:~pyspark.ml.classification.OneVsRest

SparkXGBRegressor automatically supports most of the parameters in xgboost.XGBRegressor constructor and most of the parameters used in :py:class:xgboost.XGBRegressor fit and predict method.

SparkXGBRegressor doesn’t support setting gpu_id but support another param use_gpu, see doc below for more details.

SparkXGBRegressor doesn’t support setting base_margin explicitly as well, but support another param called base_margin_col. see doc below for more details.

SparkXGBRegressor doesn’t support validate_features and output_margin param.

SparkXGBRegressor doesn’t support setting nthread xgboost param, instead, the nthread param for each xgboost worker will be set equal to spark.task.cpus config value.

callbacks: The export and import of the callback functions are at best effort. For details, see :py:attr:xgboost.spark.SparkXGBRegressor.callbacks param doc. validation_indicator_col For params related to xgboost.XGBRegressor training with evaluation dataset’s supervision, set :py:attr:xgboost.spark.SparkXGBRegressor.validation_indicator_col parameter instead of setting the eval_set parameter in xgboost.XGBRegressor fit method. weight_col: To specify the weight of the training and validation dataset, set :py:attr:xgboost.spark.SparkXGBRegressor.weight_col parameter instead of setting sample_weight and sample_weight_eval_set parameter in xgboost.XGBRegressor fit method. xgb_model: Set the value to be the instance returned by :func:xgboost.spark.SparkXGBRegressorModel.get_booster. num_workers: Integer that specifies the number of XGBoost workers to use. Each XGBoost worker corresponds to one spark task. use_gpu: Boolean that specifies whether the executors are running on GPU instances. base_margin_col: To specify the base margins of the training and validation dataset, set :py:attr:xgboost.spark.SparkXGBRegressor.base_margin_col parameter instead of setting base_margin and base_margin_eval_set in the xgboost.XGBRegressor fit method. Note: this isn’t available for distributed training.

.. Note:: The Parameters chart above contains parameters that need special handling. For a full list of parameters, see entries with Param(parent=... below.

.. Note:: This API is experimental.*