Support Pagination in QueryTable and QueryTableChanges APIs
charlenelyu-db opened this issue · 0 comments
This is a proposal to add pagination support in QueryTable API and QueryTableChanges API.
Motivation
Currently, we don't have a mechanism to restrict the number of files returned per query table request. When reading from tables that contain millions of files, the server may not be able to process such a substantial volume of files, leading to issues like timeouts or exceeding resource limits. Additionally, the client may struggle to handle large responses efficiently. This limitation becomes a performance bottleneck for the Delta Sharing service.
By introducing pagination in data access APIs, we can control the number of files returned in each API call. This will result in a more scalable Delta Sharing server and client solution.
Protocol Change
We propose the following protocol changes:
QueryTable
HTTP Request | Value |
---|---|
Method |
|
Headers |
|
URL |
|
URL Parameters |
No Change |
Body |
Add two optional fields:
|
Example:
POST {prefix}/shares/vaccine_share/schemas/acme_vaccine_data/tables/vaccine_patients/query
{
"maxFiles": 123,
"pageToken": "..."
}
200: The tables were successfully returned.
HTTP Response | Value |
---|---|
Headers |
|
Body (example) |
{
"protocol": {
"minReaderVersion": 1
}
}
{
"metaData": {
"id": "string",
"format": {
"provider": "parquet"
},
"schemaString": "string",
"partitionColumns": [
"date"
]
}
}
{
"file": {
"url": "string",
"id": "string",
"partitionValues": {
"date": "2021-04-28"
},
"size":573,
"stats": "string"
}
}
{
"endStreamAction": {
"nextPageToken": "string"
}
} Note: the |
QueryTableChanges
HTTP Request | Value |
---|---|
Method |
|
Headers |
|
URL |
|
URL Parameters |
No Change |
Query Parameters |
Add two optional fields:
|
Example:
GET {prefix}/shares/vaccine_share/schemas/acme_vaccine_data/tables/vaccine_patients/changes?startingVersion=0&endingVersion=2&maxFiles=123&pageToken=...
200: The tables were successfully returned.
HTTP Response | Value |
---|---|
Headers |
|
Body (example) |
{
"protocol": {
"minReaderVersion": 1
}
}
{
"metaData": {
"id": "string",
"format": {
"provider": "parquet"
},
"schemaString": "string",
"partitionColumns": [
"date"
],
"configuration": {
"enableChangeDataFeed": "true"
}
}
}
{
"cdf": {
"url": "string",
"id": "string",
"partitionValues": {
"date": "2021-04-28"
},
"size":573,
"timestamp": 1652141000000,
"version": 1
}
}
{
"endStreamAction": {
"nextPageToken": "string"
}
} Note: the |