Quick start (local)

Main concepts

The main component of mlforecast is the MLForecast class, which abstracts away:

Feature engineering and model training through MLForecast.fit
Feature updates and multi step ahead predictions through MLForecast.predict

Data format

The data is expected to be a pandas dataframe in long format, that is, each row represents an observation of a single serie at a given time, with at least three columns:

id_col: column that identifies each serie.
target_col: column that has the series values at each timestamp.
time_col: column that contains the time the series value was observed. These are usually timestamps, but can also be consecutive integers.

Here we present an example using the classic Box & Jenkins airline data, which measures monthly totals of international airline passengers from 1949 to 1960. Source: Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) Time Series Analysis, Forecasting and Control. Third Edition. Holden-Day. Series G.

import pandas as pd
from utilsforecast.plotting import plot_series

df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/air-passengers.csv', parse_dates=['ds'])
df.head()

	unique_id	ds	y
0	AirPassengers	1949-01-01	112
1	AirPassengers	1949-02-01	118
2	AirPassengers	1949-03-01	132
3	AirPassengers	1949-04-01	129
4	AirPassengers	1949-05-01	121

df['unique_id'].value_counts()

AirPassengers    144
Name: unique_id, dtype: int64

Here the unique_id column has the same value for all rows because this is a single time series, you can have multiple time series by stacking them together and having a column that differentiates them. We also have the ds column that contains the timestamps, in this case with a monthly frequency, and the y column that contains the series values in each timestamp.

Modeling

fig = plot_series(df)

We can see that the serie has a clear trend, so we can take the first difference, i.e. take each value and subtract the value at the previous month. This can be achieved by passing an mlforecast.target_transforms.Differences([1]) instance to target_transforms. We can then train a linear regression using the value from the same month at the previous year (lag 12) as a feature, this is done by passing lags=[12].

from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from sklearn.linear_model import LinearRegression

fcst = MLForecast(
    models=LinearRegression(),
    freq='MS',  # our serie has a monthly frequency
    lags=[12],
    target_transforms=[Differences([1])],
)
fcst.fit(df)

MLForecast(models=[LinearRegression], freq=MS, lag_features=['lag12'], date_features=[], num_threads=1)

The previous line computed the features and trained the model, so now we’re ready to compute our forecasts.

Forecasting

Compute the forecast for the next 12 months

preds = fcst.predict(12)
preds

	unique_id	ds	LinearRegression
0	AirPassengers	1961-01-01	444.656555
1	AirPassengers	1961-02-01	417.470734
2	AirPassengers	1961-03-01	446.903046
3	AirPassengers	1961-04-01	491.014130
4	AirPassengers	1961-05-01	502.622223
5	AirPassengers	1961-06-01	568.751465
6	AirPassengers	1961-07-01	660.044312
7	AirPassengers	1961-08-01	643.343323
8	AirPassengers	1961-09-01	540.666687
9	AirPassengers	1961-10-01	491.462708
10	AirPassengers	1961-11-01	417.095154
11	AirPassengers	1961-12-01	461.206238

Visualize results

We can visualize what our prediction looks like.

fig = plot_series(df, preds)

And that’s it! You’ve trained a linear regression to predict the air passengers for 1961.

Getting Started

How-to guides

Tutorials

API Reference

Quick start (local)

Main concepts

Data format

Modeling

Forecasting

Visualize results

Getting Started

How-to guides

Tutorials

API Reference

​Main concepts

​Data format

​Modeling

​Forecasting

​Visualize results

Main concepts

Data format

Modeling

Forecasting

Visualize results