SparkXGBForecast
spark XGBoost forecaster
Wrapper of xgboost.spark.SparkXGBRegressor
that adds an
extract_local_model
method to get a local version of the trained model
and broadcast it to the workers.
source
SparkXGBForecast
*SparkXGBRegressor is a PySpark ML estimator. It implements the XGBoost
regression algorithm based on XGBoost python library, and it can be used
in PySpark Pipeline and PySpark ML meta algorithms like -
:py:class:~pyspark.ml.tuning.CrossValidator
/ -
:py:class:~pyspark.ml.tuning.TrainValidationSplit
/ -
:py:class:~pyspark.ml.classification.OneVsRest
SparkXGBRegressor automatically supports most of the parameters in
:py:class:xgboost.XGBRegressor
constructor and most of the parameters
used in :py:meth:xgboost.XGBRegressor.fit
and
:py:meth:xgboost.XGBRegressor.predict
method.
To enable GPU support, set device
to cuda
or gpu
.
SparkXGBRegressor doesn’t support setting base_margin
explicitly as
well, but support another param called base_margin_col
. see doc below
for more details.
SparkXGBRegressor doesn’t support validate_features
and
output_margin
param.
SparkXGBRegressor doesn’t support setting nthread
xgboost param,
instead, the nthread
param for each xgboost worker will be set equal
to spark.task.cpus
config value.*