googleapis/python-bigquery

load_table_from_dataframe in combination with .result() doesn't properly await table creation.

ckanaar opened this issue · 1 comments

It seems that calling load_table_from_dataframe() and awaiting the result of the load job using the .result() method doesn't always guarantee an in time creation, or readiness of the table, when using it in subsequent queries. In rare occasions, I get a google.api_core.exceptions.NotFound error when trying to access the newly created table directly after awaiting the load job result.

Environment details

  • OS: Ubuntu 22.04.2 LTS
  • Python version: 3.10.12
  • pip version: 23.3.2
  • google-cloud-bigquery version: 3.17.2

Steps to reproduce

  1. Create a pandas DataFrame.
  2. Call load_table_from_dataframe using a write_disposition = WRITE_TRUNCATE and create_disposition = CREATE_IF_NEEDED in the bigquery.LoadJobConfig(), and awaiting the result using the .result() method.
  3. Directly query the newly created table afterwards.

Code example

import pandas as pd
from google.cloud import bigquery

dataframe = pd.DataFrame(
    {
    'name': ['John', 'Jane', 'Joe'],
    'age': [20, 25, 30]
    }
)
client = bigquery.Client(project=project_id)
table_id = f"{project_id}.{dataset_id}.test_table"
job = client.load_table_from_dataframe(
    dataframe=dataframe,
    destination=table_id,
    job_config=bigquery.LoadJobConfig(
        write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
        create_disposition=bigquery.CreateDisposition.CREATE_IF_NEEDED
    )
)
job.result()

query = f"SELECT * FROM {table_id}"
query_job = client.query(query)
query_job.result()

Stack trace

Location: EU
Job ID: <id>

google.api_core.exceptions.NotFound: 404 Not found: Table <table_id>; reason: notFound, message: Not found: <table_id>

Note that this code snippet does not necessarily result in the NotFound error, only in some rare cases do I experience this issue, but there seems to be some inconsistency nonetheless.

Closing this since I can't reproduce.