The architecture consists of an encoder-decoder structure with multiple
layers, each with residual connections and layer normalization. Finally,
a linear layer maps the decoder’s output to the forecasting window
dimension. The general intuition is that attention-based mechanisms are
able to capture the diversity of past events and correctly extrapolate
potential future distributions.
To make prediction, TimeGPT “reads” the input series much like the way
humans read a sentence – from left to right. It looks at windows of past
data, which we can think of as “tokens”, and predicts what comes next.
This prediction is based on patterns the model identifies in past data
and extrapolates into the future.