Data setup
unique_id | ds | y | |
---|---|---|---|
0 | id_0 | 2000-01-01 | 0.428973 |
1 | id_0 | 2000-01-02 | 1.423626 |
2 | id_0 | 2000-01-03 | 2.311782 |
3 | id_0 | 2000-01-04 | 3.192191 |
4 | id_0 | 2000-01-05 | 4.148767 |
Pipelines definition
Suppose that you want to use a linear regression model with the lag1 and the day of the week as features. mlforecast returns the day of the week as a single column, however, that’s not the optimal format for a linear regression model, which benefits more from having indicator columns for each day of the week (removing one to avoid colinearity). We can achieve this by using scikit-learn’s OneHotEncoder and then fitting our linear regression model, which we can do in the following way:lag1 | dayofweek | |
---|---|---|
1 | 0.428973 | 6 |
2 | 1.423626 | 0 |
3 | 2.311782 | 1 |
4 | 3.192191 | 2 |
5 | 4.148767 | 3 |
dayofweek
column and perform one hot encoding, leaving the lag1
column untouched. We can achieve that with the following:
Training
We can now build a pipeline that does this and then passes it to our linear regression model.Forecasting
Finally, we compute the forecasts.unique_id | ds | ohe_lr | |
---|---|---|---|
0 | id_0 | 2000-08-10 | 4.312748 |
1 | id_1 | 2000-04-07 | 4.537019 |
2 | id_2 | 2000-06-16 | 4.160505 |
3 | id_3 | 2000-08-30 | 3.777040 |
4 | id_4 | 2001-01-08 | 2.676933 |