Spark

Spark is an open-source distributed computing framework designed for large-scale data processing. In this guide, we will explain how to use TimeGPT on top of Spark. Outline:

Installation
Load Your Data
Initialize Spark
Use TimeGPT on Spark
Stop Spark

1. Installation

Install Spark through Fugue. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Spark.

Note You can install fugue with pip:
pip install fugue[spark]

If executing on a distributed Spark cluster, ensure that the nixtla library is installed across all the workers.

2. Load Data

You can load your data as a pandas DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets.

import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
    parse_dates=['ds'],
) 
df.head()

	unique_id	ds	y
0	BE	2016-10-22 00:00:00	70.00
1	BE	2016-10-22 01:00:00	37.10
2	BE	2016-10-22 02:00:00	37.10
3	BE	2016-10-22 03:00:00	44.75
4	BE	2016-10-22 04:00:00	37.10

3. Initialize Spark

Initialize Spark and convert the pandas DataFrame to a Spark DataFrame.

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

spark_df = spark.createDataFrame(df)
spark_df.show(5)

4. Use TimeGPT on Spark

Using TimeGPT on top of Spark is almost identical to the non-distributed case. The only difference is that you need to use a Spark DataFrame. First, instantiate the NixtlaClient class.

from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍 Use an Azure AI endpoint To use an Azure AI endpoint, set the base_url argument: nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")

Then use any method from the NixtlaClient class such as forecast or cross_validation.

fcst_df = nixtla_client.forecast(spark_df, h=12)
fcst_df.show(5)

📘 Available models in Azure AI If you are using an Azure AI endpoint, please be sure to set model="azureai": nixtla_client.forecast(..., model="azureai") For the public API, we support two models: timegpt-1 and timegpt-1-long-horizon. By default, timegpt-1 is used. Please see this tutorial on how and when to use timegpt-1-long-horizon.

cv_df = nixtla_client.cross_validation(spark_df, h=12, n_windows=5, step_size=2)
cv_df.show(5)

You can also use exogenous variables with TimeGPT on top of Spark. To do this, please refer to the Exogenous Variables tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a Spark DataFrame instead.

5. Stop Spark

When you are done, stop the Spark session.

spark.stop()

Getting Started

Capabilities

Tutorials

Use cases

API Reference

1. Installation

2. Load Data

3. Initialize Spark

4. Use TimeGPT on Spark

5. Stop Spark

Getting Started

Capabilities

Tutorials

Use cases

API Reference

​1. Installation

​2. Load Data

​3. Initialize Spark

​4. Use TimeGPT on Spark

​5. Stop Spark

1. Installation

2. Load Data

3. Initialize Spark

4. Use TimeGPT on Spark

5. Stop Spark