/delta-rest

Interact with Delta Lake through a RESTful API, served from the driver of a Spark cluster.

Primary LanguagePythonApache License 2.0Apache-2.0

delta-rest [POC]

Actions Status Actions Status

Interact with Delta Lake through a RESTful API.

DeltaREST is a Python library that allows you to easily launch inside your Spark driver process a Flask -based server exposing a RESTful API to interact with Delta tables.

(Help and pull requests are very welcome !)

Example in local

Install

pip install deltarest

Run service

from deltarest import DeltaRESTService
from pyspark.sql import SparkSession

# Create local SparkSession
SparkSession \
    .builder \
    .appName("local_deltarest_test") \
    .master("local") \
    .config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0") \
    .getOrCreate()

# Start service on port 4444
DeltaRESTService(delta_root_path="/tmp/lakehouse-root") \
    .run("0.0.0.0", "4444")

Notes: When deployed on cluster:

  • delta_root_path could be a cloud storage path.
  • deploy the spark app using client deployMode.

PUT

Create Delta table with a specific identifier (evolutive schema)

curl -X PUT http://127.0.0.1:4444/tables/foo

Response code 201.

{
    "message":"Table foo created"
}

On already existing table identifier:

curl -X PUT http://127.0.0.1:4444/tables/foo

Response code 200.

{
    "message":"Table foo already exists"
}

POST

Append json rows to a Delta table

curl -X POST http://127.0.0.1:4444/tables/foo --data '{"rows":[{"id":1,"collection":[1,2]},{"id":2,"collection":[3,4]}]}'

Response code 201.

{
    "message": "Rows created"
}

GET

List available Delta tables

curl -G http://127.0.0.1:4444/tables

Response code 200.

{
  "tables":["foo"]
}

Get a particular Delta table content

curl -G http://127.0.0.1:4444/tables/foo

Response code 200.

{
    "rows":[
        {"id":1,"collection":[1,2]},
        {"id":2,"collection":[3,4]}
    ]
}

On unexisting Delta table

curl -G http://127.0.0.1:4444/tables/bar

Response code 404.

{
  "message":"Table bar not found"
}

Get the result of an arbitrary SQL query on Delta tables

Must only involve listable delta tables.

curl -G http://127.0.0.1:4444/tables --data-urlencode "sql=SELECT count(*) as count FROM foo CROSS JOIN foo"

Response code 200.

{
    "rows":[
        {"count":4}
    ]
}