> ## Documentation Index
> Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Training with numpy arrays

> Convert your dataframes to arrays to use less memory and train faster

Most of the machine learning libraries use numpy arrays, even when you
provide a dataframe it ends up being converted into a numpy array. By
providing an array to those models we can make the process faster, since
the conversion will only happen once.

## Data setup

```python theme={null}
from mlforecast.utils import generate_daily_series
```

```python theme={null}
series = generate_daily_series(5)
```

## fit and cross\_validation methods

```python theme={null}
import numpy as np
from lightgbm import LGBMRegressor
from sklearn.linear_model import LinearRegression

from mlforecast import MLForecast
```

```python theme={null}
fcst = MLForecast(
    models={'lr': LinearRegression(), 'lgbm': LGBMRegressor(verbosity=-1)},
    freq='D',
    lags=[7, 14],
    date_features=['dayofweek'],
)
```

If you’re using the fit/cross\_validation methods from `MLForecast` all
you have to do to train with numpy arrays is provide the `as_numpy`
argument, which will cast the features to an array before passing them
to the models.

```python theme={null}
fcst.fit(series, as_numpy=True)
```

```text theme={null}
MLForecast(models=[lr, lgbm], freq=<Day>, lag_features=['lag7', 'lag14'], date_features=['dayofweek'], num_threads=1)
```

When predicting, the new features will also be cast to arrays, so it can
also be faster.

```python theme={null}
fcst.predict(1)
```

|   | unique\_id | ds         | lr       | lgbm     |
| - | ---------- | ---------- | -------- | -------- |
| 0 | id\_0      | 2000-08-10 | 5.268787 | 6.322262 |
| 1 | id\_1      | 2000-04-07 | 4.437316 | 5.213255 |
| 2 | id\_2      | 2000-06-16 | 3.246518 | 4.373904 |
| 3 | id\_3      | 2000-08-30 | 0.144860 | 1.285219 |
| 4 | id\_4      | 2001-01-08 | 2.211318 | 3.236700 |

For cross\_validation we also just need to specify `as_numpy=True`.

```python theme={null}
cv_res = fcst.cross_validation(series, n_windows=2, h=2, as_numpy=True)
```

## preprocess method

Having the features as a numpy array can also be helpful in cases where
you have categorical columns and the library doesn’t support them, for
example LightGBM with polars. In order to use categorical features with
LightGBM and polars we have to convert them to their integer
representation and tell LightGBM to treat those features as categorical,
which we can achieve in the following way:

```python theme={null}
series_pl = generate_daily_series(5, n_static_features=1, engine='polars')
series_pl.head(2)
```

| unique\_id | ds                  | y          | static\_0 |
| ---------- | ------------------- | ---------- | --------- |
| cat        | datetime\[ns]       | f64        | cat       |
| "id\_0"    | 2000-01-01 00:00:00 | 36.462689  | "84"      |
| "id\_0"    | 2000-01-02 00:00:00 | 121.008199 | "84"      |

```python theme={null}
fcst = MLForecast(
    models=[],
    freq='1d',
    lags=[7, 14],
    date_features=['weekday'],
)
```

In order to get the features as an array with the preprocess method we
also have to ask for the X, y tuple.

```python theme={null}
X, y = fcst.preprocess(series_pl, return_X_y=True, as_numpy=True)
X[:2]
```

```text theme={null}
array([[  0.        ,  20.30076749,  36.46268875,   6.        ],
       [  0.        , 119.51717097, 121.0081989 ,   7.        ]])
```

The feature names are available in `fcst.ts.features_order_`

```python theme={null}
fcst.ts.features_order_
```

```text theme={null}
['static_0', 'lag7', 'lag14', 'weekday']
```

Now we can just train a LightGBM model specifying the feature names and
which features should be treated as categorical.

```python theme={null}
model = LGBMRegressor(verbosity=-1)
model.fit(
    X=X,
    y=y,
    feature_name=fcst.ts.features_order_,
    categorical_feature=['static_0', 'weekday'],
);
```

We can now add this model to our models dict, as described in the
[custom training guide](./custom_training.html).

```python theme={null}
fcst.models_ = {'lgbm': model}
```

And use it to predict.

```python theme={null}
fcst.predict(1)
```

| unique\_id | ds                  | lgbm       |
| ---------- | ------------------- | ---------- |
| cat        | datetime\[ns]       | f64        |
| "id\_0"    | 2000-08-10 00:00:00 | 448.796188 |
| "id\_1"    | 2000-04-07 00:00:00 | 81.058211  |
| "id\_2"    | 2000-06-16 00:00:00 | 4.450549   |
| "id\_3"    | 2000-08-30 00:00:00 | 14.219603  |
| "id\_4"    | 2001-01-08 00:00:00 | 87.361881  |