> ## Documentation Index
> Fetch the complete documentation index at: https://nixtlaverse.nixtla.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Quick start (local)

> Minimal example of MLForecast

## Main concepts

The main component of mlforecast is the `MLForecast` class, which
abstracts away:

* Feature engineering and model training through `MLForecast.fit`
* Feature updates and multi step ahead predictions through
  `MLForecast.predict`

## Data format

The data is expected to be a pandas dataframe in long format, that is,
each row represents an observation of a single series at a given time,
with at least three columns:

* `id_col`: column that identifies each series.
* `target_col`: column that has the series values at each timestamp.
* `time_col`: column that contains the time the series value was
  observed. These are usually timestamps, but can also be consecutive
  integers.

Here we present an example using the classic Box & Jenkins airline data,
which measures monthly totals of international airline passengers from
1949 to 1960 \[1].

```python theme={null}
import pandas as pd
from utilsforecast.plotting import plot_series
```

```python theme={null}
df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv', parse_dates=['ds'])
df.head()
```

|   | unique\_id    | ds         | y   |
| - | ------------- | ---------- | --- |
| 0 | AirPassengers | 1949-01-01 | 112 |
| 1 | AirPassengers | 1949-02-01 | 118 |
| 2 | AirPassengers | 1949-03-01 | 132 |
| 3 | AirPassengers | 1949-04-01 | 129 |
| 4 | AirPassengers | 1949-05-01 | 121 |

```python theme={null}
df['unique_id'].value_counts()
```

```text theme={null}
AirPassengers    144
Name: unique_id, dtype: int64
```

Here the `unique_id` column has the same value for all rows because this
is a single time series, you can have multiple time series by stacking
them together and having a column that differentiates them.

We also have the `ds` column that contains the timestamps, in this case
with a monthly frequency, and the `y` column that contains the series
values in each timestamp.

## Modeling

```python theme={null}
fig = plot_series(df)
```

<img src="https://mintcdn.com/nixtla/2ja1JTy8E1f3Ljix/mlforecast/figs/quick_start_local__eda.png?fit=max&auto=format&n=2ja1JTy8E1f3Ljix&q=85&s=ea1ebf1bc677dcd87d2d517b0738f5f1" alt="" width="2168" height="353" data-path="mlforecast/figs/quick_start_local__eda.png" />

We can see that the series has a clear trend, so we can take the first
difference, i.e. take each value and subtract the value at the previous
month. This can be achieved by passing an
`mlforecast.target_transforms.Differences([1])` instance to
`target_transforms`.

We can then train a linear regression using the value from the same
month at the previous year (lag 12) as a feature, this is done by
passing `lags=[12]`.

```python theme={null}
from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from sklearn.linear_model import LinearRegression
```

```python theme={null}
fcst = MLForecast(
    models=LinearRegression(),
    freq='MS',  # our series has a monthly frequency
    lags=[12],
    target_transforms=[Differences([1])],
)
fcst.fit(df)
```

```text theme={null}
MLForecast(models=[LinearRegression], freq=MS, lag_features=['lag12'], date_features=[], num_threads=1)
```

The previous line computed the features and trained the model, so now
we’re ready to compute our forecasts.

## Forecasting

Compute the forecast for the next 12 months

```python theme={null}
preds = fcst.predict(12)
preds
```

|    | unique\_id    | ds         | LinearRegression |
| -- | ------------- | ---------- | ---------------- |
| 0  | AirPassengers | 1961-01-01 | 444.656555       |
| 1  | AirPassengers | 1961-02-01 | 417.470734       |
| 2  | AirPassengers | 1961-03-01 | 446.903046       |
| 3  | AirPassengers | 1961-04-01 | 491.014130       |
| 4  | AirPassengers | 1961-05-01 | 502.622223       |
| 5  | AirPassengers | 1961-06-01 | 568.751465       |
| 6  | AirPassengers | 1961-07-01 | 660.044312       |
| 7  | AirPassengers | 1961-08-01 | 643.343323       |
| 8  | AirPassengers | 1961-09-01 | 540.666687       |
| 9  | AirPassengers | 1961-10-01 | 491.462708       |
| 10 | AirPassengers | 1961-11-01 | 417.095154       |
| 11 | AirPassengers | 1961-12-01 | 461.206238       |

## Visualize the results

We can visualize what our prediction looks like.

```python theme={null}
fig = plot_series(df, preds)
```

<img src="https://mintcdn.com/nixtla/2ja1JTy8E1f3Ljix/mlforecast/figs/quick_start_local__predictions.png?fit=max&auto=format&n=2ja1JTy8E1f3Ljix&q=85&s=027230dd0203cc91bcd369b70598b041" alt="" width="2168" height="353" data-path="mlforecast/figs/quick_start_local__predictions.png" />

And that’s it! You’ve trained a linear regression to predict the air
passengers for 1961.

## References

\[1] Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) Time
Series Analysis, Forecasting and Control. Third Edition. Holden-Day.
Series G.
