Data Requirements

In this example we will go through the dataset input requirements of the core.NeuralForecast class. The core.NeuralForecast methods operate as global models that receive a set of time series rather than single series. The class uses cross-learning technique to fit flexible-shared models such as neural networks improving its generalization capabilities as shown by the M4 international forecasting competition (Smyl 2019, Semenoglou 2021). You can run these experiments using GPU with Google Colab.

Long format

Multiple time series

Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific series and timestamp. Let’s see an example using the datasetsforecast library. Y_df = pd.concat( [series1, series2, ...])

!pip install datasetsforecast

import pandas as pd
from datasetsforecast.m3 import M3

Y_df, *_ = M3.load('./data', group='Yearly')

Y_df.groupby('unique_id').head(2)

	unique_id	ds	y
0	Y1	1975-12-31	940.66
1	Y1	1976-12-31	1084.86
20	Y10	1975-12-31	2160.04
21	Y10	1976-12-31	2553.48
40	Y100	1975-12-31	1424.70
…	…	…	…
18260	Y97	1976-12-31	1618.91
18279	Y98	1975-12-31	1164.97
18280	Y98	1976-12-31	1277.87
18299	Y99	1975-12-31	1870.00
18300	Y99	1976-12-31	1307.20

Y_df is a dataframe with three columns: unique_id with a unique identifier for each time series, a column ds with the datestamp and a column y with the values of the series.

Single time series

If you have only one time series, you have to include the unique_id column. Consider, for example, the AirPassengers dataset.

Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
Y_df

	timestamp	value
0	1949-01-01	112
1	1949-02-01	118
2	1949-03-01	132
3	1949-04-01	129
4	1949-05-01	121
…	…	…
139	1960-08-01	606
140	1960-09-01	508
141	1960-10-01	461
142	1960-11-01	390
143	1960-12-01	432

In this example Y_df only contains two columns: timestamp, and value. To use NeuralForecast we have to include the unique_id column and rename the previuos ones.

Y_df['unique_id'] = 1. # We can add an integer as identifier
Y_df = Y_df.rename(columns={'timestamp': 'ds', 'value': 'y'})
Y_df = Y_df[['unique_id', 'ds', 'y']]
Y_df

	unique_id	ds	y
0	1.0	1949-01-01	112
1	1.0	1949-02-01	118
2	1.0	1949-03-01	132
3	1.0	1949-04-01	129
4	1.0	1949-05-01	121
…	…	…	…
139	1.0	1960-08-01	606
140	1.0	1960-09-01	508
141	1.0	1960-10-01	461
142	1.0	1960-11-01	390
143	1.0	1960-12-01	432

Getting Started

Capabilities

Tutorials

Use cases

API Reference

Data Requirements

Long format

Multiple time series

Single time series

References

Getting Started

Capabilities

Tutorials

Use cases

API Reference

​Long format

​Multiple time series

​Single time series

​References

Long format

Multiple time series

Single time series

References