Wrapper of xgboost.dask.DaskXGBRegressor that adds a model_ property that contains the fitted model and is sent to the workers in the forecasting step.


source

DaskXGBForecast

 DaskXGBForecast (max_depth:Optional[int]=None,
                  max_leaves:Optional[int]=None,
                  max_bin:Optional[int]=None,
                  grow_policy:Optional[str]=None,
                  learning_rate:Optional[float]=None,
                  n_estimators:int=100, verbosity:Optional[int]=None, obje
                  ctive:Union[str,Callable[[numpy.ndarray,numpy.ndarray],T
                  uple[numpy.ndarray,numpy.ndarray]],NoneType]=None,
                  booster:Optional[str]=None,
                  tree_method:Optional[str]=None,
                  n_jobs:Optional[int]=None, gamma:Optional[float]=None,
                  min_child_weight:Optional[float]=None,
                  max_delta_step:Optional[float]=None,
                  subsample:Optional[float]=None,
                  sampling_method:Optional[str]=None,
                  colsample_bytree:Optional[float]=None,
                  colsample_bylevel:Optional[float]=None,
                  colsample_bynode:Optional[float]=None,
                  reg_alpha:Optional[float]=None,
                  reg_lambda:Optional[float]=None,
                  scale_pos_weight:Optional[float]=None,
                  base_score:Optional[float]=None, random_state:Union[nump
                  y.random.mtrand.RandomState,int,NoneType]=None,
                  missing:float=nan, num_parallel_tree:Optional[int]=None,
                  monotone_constraints:Union[Dict[str,int],str,NoneType]=N
                  one, interaction_constraints:Union[str,Sequence[Sequence
                  [str]],NoneType]=None,
                  importance_type:Optional[str]=None,
                  gpu_id:Optional[int]=None,
                  validate_parameters:Optional[bool]=None,
                  predictor:Optional[str]=None,
                  enable_categorical:bool=False,
                  feature_types:Sequence[str]=None,
                  max_cat_to_onehot:Optional[int]=None,
                  max_cat_threshold:Optional[int]=None,
                  eval_metric:Union[str,List[str],Callable,NoneType]=None,
                  early_stopping_rounds:Optional[int]=None, callbacks:Opti
                  onal[List[xgboost.callback.TrainingCallback]]=None,
                  **kwargs:Any)

Implementation of the Scikit-Learn API for XGBoost.

TypeDefaultDetails
max_depthOptionalNoneMaximum tree depth for base learners.
max_leavesOptionalNoneMaximum number of leaves; 0 indicates no limit.
max_binOptionalNoneIf using histogram-based algorithm, maximum number of bins per feature
grow_policyOptionalNoneTree growing policy. 0: favor splitting at nodes closest to the node, i.e. grow
depth-wise. 1: favor splitting at nodes with highest loss change.
learning_rateOptionalNoneBoosting learning rate (xgb’s “eta”)
n_estimatorsint100Number of gradient boosted trees. Equivalent to number of boosting
rounds.
verbosityOptionalNoneThe degree of verbosity. Valid values are 0 (silent) - 3 (debug).
objectiveUnionNoneSpecify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
boosterOptionalNone
tree_methodOptionalNone
n_jobsOptionalNoneNumber of parallel threads used to run xgboost. When used with other
Scikit-Learn algorithms like grid search, you may choose which algorithm to
parallelize and balance the threads. Creating thread contention will
significantly slow down both algorithms.
gammaOptionalNone(min_split_loss) Minimum loss reduction required to make a further partition on a
leaf node of the tree.
min_child_weightOptionalNoneMinimum sum of instance weight(hessian) needed in a child.
max_delta_stepOptionalNoneMaximum delta step we allow each tree’s weight estimation to be.
subsampleOptionalNoneSubsample ratio of the training instance.
sampling_methodOptionalNoneSampling method. Used only by gpu_hist tree method.
- uniform: select random training instances uniformly.
- gradient_based select random training instances with higher probability when
the gradient and hessian are larger. (cf. CatBoost)
colsample_bytreeOptionalNoneSubsample ratio of columns when constructing each tree.
colsample_bylevelOptionalNoneSubsample ratio of columns for each level.
colsample_bynodeOptionalNoneSubsample ratio of columns for each split.
reg_alphaOptionalNoneL1 regularization term on weights (xgb’s alpha).
reg_lambdaOptionalNoneL2 regularization term on weights (xgb’s lambda).
scale_pos_weightOptionalNoneBalancing of positive and negative weights.
base_scoreOptionalNoneThe initial prediction score of all instances, global bias.
random_stateUnionNoneRandom number seed.

.. note::

Using gblinear booster with shotgun updater is nondeterministic as
it uses Hogwild algorithm.
missingfloatnanValue in the data which needs to be present as a missing value.
num_parallel_treeOptionalNone
monotone_constraintsUnionNoneConstraint of variable monotonicity. See :doc:tutorial </tutorials/monotonic>
for more information.
interaction_constraintsUnionNoneConstraints for interaction representing permitted interactions. The
constraints must be specified in the form of a nested list, e.g. [[0, 1], [2,<br/>3, 4]], where each inner list is a group of indices of features that are
allowed to interact with each other. See :doc:tutorial<br/></tutorials/feature_interaction_constraint> for more information
importance_typeOptionalNone
gpu_idOptionalNoneDevice ordinal.
validate_parametersOptionalNoneGive warnings for unknown parameter.
predictorOptionalNoneForce XGBoost to use specific predictor, available choices are [cpu_predictor,
gpu_predictor].
enable_categoricalboolFalse.. versionadded:: 1.5.0

.. note:: This parameter is experimental

Experimental support for categorical data. When enabled, cudf/pandas.DataFrame
should be used to specify categorical data type. Also, JSON/UBJSON
serialization format is required.
feature_typesSequenceNone.. versionadded:: 1.7.0

Used for specifying feature types without constructing a dataframe. See
:py:class:DMatrix for details.
max_cat_to_onehotOptionalNone.. versionadded:: 1.6.0

.. note:: This parameter is experimental

A threshold for deciding whether XGBoost should use one-hot encoding based split
for categorical data. When number of categories is lesser than the threshold
then one-hot encoding is chosen, otherwise the categories will be partitioned
into children nodes. Also, enable_categorical needs to be set to have
categorical feature support. See :doc:Categorical Data<br/></tutorials/categorical> and :ref:cat-param for details.
max_cat_thresholdOptionalNone.. versionadded:: 1.7.0

.. note:: This parameter is experimental

Maximum number of categories considered for each split. Used only by
partition-based splits for preventing over-fitting. Also, enable_categorical
needs to be set to have categorical feature support. See :doc:Categorical Data<br/></tutorials/categorical> and :ref:cat-param for details.
eval_metricUnionNone.. versionadded:: 1.6.0

Metric used for monitoring the training result and early stopping. It can be a
string or list of strings as names of predefined metric in XGBoost (See
doc/parameter.rst), one of the metrics in :py:mod:sklearn.metrics, or any other
user defined metric that looks like sklearn.metrics.

If custom objective is also provided, then custom metric should implement the
corresponding reverse link function.

Unlike the scoring parameter commonly used in scikit-learn, when a callable
object is provided, it’s assumed to be a cost function and by default XGBoost will
minimize the result during early stopping.

For advanced usage on Early stopping like directly choosing to maximize instead of
minimize, see :py:obj:xgboost.callback.EarlyStopping.

See :doc:Custom Objective and Evaluation Metric </tutorials/custom_metric_obj>
for more.

.. note::

This parameter replaces eval_metric in :py:meth:fit method. The old one
receives un-transformed prediction regardless of whether custom objective is
being used.

.. code-block:: python

from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_absolute_error
X, y = load_diabetes(return_X_y=True)
reg = xgb.XGBRegressor(
tree_method=“hist”,
eval_metric=mean_absolute_error,
)
reg.fit(X, y, eval_set=[(X, y)])
early_stopping_roundsOptionalNone.. versionadded:: 1.6.0

Activates early stopping. Validation metric needs to improve at least once in
every early_stopping_rounds round(s) to continue training. Requires at least
one item in eval_set in :py:meth:fit.

The method returns the model from the last iteration (not the best one). If
there’s more than one item in eval_set, the last entry will be used for early
stopping. If there’s more than one metric in eval_metric, the last metric
will be used for early stopping.

If early stopping occurs, the model will have three additional fields:
:py:attr:best_score, :py:attr:best_iteration and
:py:attr:best_ntree_limit.

.. note::

This parameter replaces early_stopping_rounds in :py:meth:fit method.
callbacksOptionalNoneList of callback functions that are applied at end of each iteration.
It is possible to use predefined callbacks by using
:ref:Callback API <callback_api>.

.. note::

States in callback are not preserved during training, which means callback
objects can not be reused for multiple training sessions without
reinitialization or deepcopy.

.. code-block:: python

for params in parameters_grid:
# be sure to (re)initialize the callbacks before each run
callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]
xgboost.train(params, Xy, callbacks=callbacks)
kwargsAnyKeyword arguments for XGBoost Booster object. Full documentation of parameters
can be found :doc:here </parameter>.
Attempting to set a parameter via the constructor args and **kwargs
dict simultaneously will result in a TypeError.

.. note:: **kwargs unsupported by scikit-learn

**kwargs is unsupported by scikit-learn. We do not guarantee
that parameters passed via this argument will interact properly
with scikit-learn.
ReturnsNone