tylertreat/BigQuery-Python

get_query_rows offset option fails across pages

chayac opened this issue · 0 comments

I am trying to retrieve rows from a query with a result set of about 250,000 rows, and I'm attempting to chunk by using offset and limit. This works for the first 4,000 rows but then fails. I think the issue is that offset is not compatible with pagination.

Traceback (most recent call last):
File "get_data.py", line 80, in
results = get_data(bq, job_id)
File "get_data.py", line 33, in get_data
results = client.get_query_rows(job_id, offset=start_value, limit=limit)
File "/python3.4/site-packages/bigquery/client.py", line 438, in get_query_rows
timeout=timeout)
File "/python3.4/site-packages/bigquery/client.py", line 1531, in get_query_results
timeoutMs=timeout * 1000).execute()
File "/python3.4/site-packages/oauth2client/util.py", line 137, in positional_wrapper
return wrapped(*args, **kwargs)
File "/python3.4/site-packages/googleapiclient/http.py", line 840, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://www.googleapis.com/bigquery/v2/projects/project-id/queries/job-id?maxResults=1000&startIndex=4000&timeoutMs=0&pageToken=token&alt=json returned "When using a page token, you cannot specify an arbitrary startIndex.">