bigquery load_table_from_dataframe from string type will show null values
superbeer opened this issue · 3 comments
Environment details
- OS type and version: Mac os
- Python version:
python --version
Python 3.11.6 - pip version:
pip --version
pip 23.2.1
Name: google-cloud-bigquery
Version: 3.13.0
Steps to reproduce
Code example
from google.cloud import bigquery
import pandas as pd
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "{project_id}.{dataset_id}.{table_id}"
df=pd.read_csv("data.csv")
print (df.dtypes)
df= df.astype(str)
print (df.dtypes)
print(df)
job_config = bigquery.LoadJobConfig(
write_disposition="WRITE_TRUNCATE",
)
job = client.load_table_from_dataframe(
df, table_id, job_config=job_config
) # Make an API request.
job.result() # Wait for the job to complete.
Stack trace
data.csv
# example
col_a,col_b,col_c
a,,
aa,bb,cc
aaa,,ccc
it is show
col_a col_b col_c
0 a nan nan
1 aa bb cc
2 aaa nan ccc
on bq show
col_a col_b col_c
a nan nan
aa bb cc
aaa nan ccc
i think data on bq will show
col_a col_b col_c
a null null
aa bb cc
aaa null ccc
ps if data is dtypes object and value NaN . it is work
Hi @superbeer, thank you for raising the issue. Could you clarify which value you are expecting - NaN
or null
?
Hi @superbeer, thank you for raising the issue. Could you clarify which value you are expecting -
NaN
ornull
?
in bigquery expecting null value
I think this is a problem with pandas.read_csv()
. The dataframe loaded by pandas is already using nan
for null strings, so BigQuery is working as intended. It seems that many people have the same problem with loading null strings from csv, and adding a parameter can resolve the problem, like this: pandas.read_csv("data.csv", keep_default_na=False)
. (see stack overflow)
I'm closing the issue, but feel free to leave a comment or open a new issue, if you have any further questions. :)