Data setup
| unique_id | ds | y | |
|---|---|---|---|
| 0 | id_0 | 2000-01-01 | 0.428973 | 
| 1 | id_0 | 2000-01-02 | 1.423626 | 
| 2 | id_0 | 2000-01-03 | 2.311782 | 
| 3 | id_0 | 2000-01-04 | 3.192191 | 
| 4 | id_0 | 2000-01-05 | 4.148767 | 
Pipelines definition
Suppose that you want to use a linear regression model with the lag1 and the day of the week as features. mlforecast returns the day of the week as a single column, however, thatβs not the optimal format for a linear regression model, which benefits more from having indicator columns for each day of the week (removing one to avoid colinearity). We can achieve this by using scikit-learnβs OneHotEncoder and then fitting our linear regression model, which we can do in the following way:| lag1 | dayofweek | |
|---|---|---|
| 1 | 0.428973 | 6 | 
| 2 | 1.423626 | 0 | 
| 3 | 2.311782 | 1 | 
| 4 | 3.192191 | 2 | 
| 5 | 4.148767 | 3 | 
dayofweek column and perform one hot encoding, leaving the lag1
column untouched. We can achieve that with the following:
Training
We can now build a pipeline that does this and then passes it to our linear regression model.Forecasting
Finally, we compute the forecasts.| unique_id | ds | ohe_lr | |
|---|---|---|---|
| 0 | id_0 | 2000-08-10 | 4.312748 | 
| 1 | id_1 | 2000-04-07 | 4.537019 | 
| 2 | id_2 | 2000-06-16 | 4.160505 | 
| 3 | id_3 | 2000-08-30 | 3.777040 | 
| 4 | id_4 | 2001-01-08 | 2.676933 | 

