SparkXGBForecast
spark XGBoost forecaster
Wrapper of xgboost.spark.SparkXGBRegressor
that adds an
extract_local_model
method to get a local version of the trained model
and broadcast it to the workers.
source
SparkXGBForecast
SparkXGBForecast (**kwargs)
*SparkXGBRegressor is a PySpark ML estimator. It implements the XGBoost
regression algorithm based on XGBoost python library, and it can be used
in PySpark Pipeline and PySpark ML meta algorithms like
:py:class:~pyspark.ml.tuning.CrossValidator
/
:py:class:~pyspark.ml.tuning.TrainValidationSplit
/
:py:class:~pyspark.ml.classification.OneVsRest
SparkXGBRegressor automatically supports most of the parameters in
xgboost.XGBRegressor
constructor and most of the parameters used in
:py:class:xgboost.XGBRegressor
fit and predict method.
SparkXGBRegressor doesn’t support setting gpu_id
but support another
param use_gpu
, see doc below for more details.
SparkXGBRegressor doesn’t support setting base_margin
explicitly as
well, but support another param called base_margin_col
. see doc below
for more details.
SparkXGBRegressor doesn’t support validate_features
and
output_margin
param.
SparkXGBRegressor doesn’t support setting nthread
xgboost param,
instead, the nthread
param for each xgboost worker will be set equal
to spark.task.cpus
config value.
callbacks: The export and import of the callback functions are at best
effort. For details, see
:py:attr:xgboost.spark.SparkXGBRegressor.callbacks
param doc.
validation_indicator_col For params related to xgboost.XGBRegressor
training with evaluation dataset’s supervision, set
:py:attr:xgboost.spark.SparkXGBRegressor.validation_indicator_col
parameter instead of setting the eval_set
parameter in
xgboost.XGBRegressor
fit method. weight_col: To specify the weight of
the training and validation dataset, set
:py:attr:xgboost.spark.SparkXGBRegressor.weight_col
parameter instead
of setting sample_weight
and sample_weight_eval_set
parameter in
xgboost.XGBRegressor
fit method. xgb_model: Set the value to be the
instance returned by
:func:xgboost.spark.SparkXGBRegressorModel.get_booster
. num_workers:
Integer that specifies the number of XGBoost workers to use. Each
XGBoost worker corresponds to one spark task. use_gpu: Boolean that
specifies whether the executors are running on GPU instances.
base_margin_col: To specify the base margins of the training and
validation dataset, set
:py:attr:xgboost.spark.SparkXGBRegressor.base_margin_col
parameter
instead of setting base_margin
and base_margin_eval_set
in the
xgboost.XGBRegressor
fit method. Note: this isn’t available for
distributed training.
.. Note:: The Parameters chart above contains parameters that need
special handling. For a full list of parameters, see entries with
Param(parent=...
below.
.. Note:: This API is experimental.*