Handling Null values for bigQuery IO
1byxero opened this issue · 5 comments
I have a script which fetches data from google datastore, converts each retrieved entity into python dictionary to be inserted as TableRow into google bigquery table
My issue is about handling None/null values in the TableRow dictionary.
The retrieved entities of datastore may or may not have certain attributes required in bigquery table schema. For such attributes of entity i am creating None value for the corresponding key of dictionary. Which results in list of following errors
InsertErrorsValueListEntry
errors: [<ErrorProto
debugInfo: u'generic::not_found: no such field.'
location: u'foo_bar'
message: u'no such field.'
reason: u'invalid'>]
index: 0>
How do i go about it
Basically my intention from this dataflow pipeline is as follows
1. Fetch data from Google datastore
2. Create a one to one mapping of datastore entities to BigQuery TableRows
3. Push the rows into BigQuery
This is How i have created the pipe
(p
| 'read from datastore' >> ReadFromDatastore(project,
query, user_options.namespace)
| 'convert to table rows' >> beam.Map(convert_to_table_row)
| 'write to bq' >> beam.io.Write(
beam.io.BigQuerySink('%s:%s.%s' % (project, dataset, table),
schema=table_schema,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)))
If i replace the Map function to custom ParDo function and try to return a json from ParDo transformation function it throws error for returning strings.
Please see: #17 it is likely related.
cc: @chamikaramj
#17 is about reading records involving null values,
I am interested in writing records which have null values
Could you mention what runner and SDK version you are using ?
If you are using DataflowRunner could you provide a job ID ?
'None' values should be written correctly. Can you verify that the column to which you write 'None' values is 'NALLABLE' and that 'convert_to_table_row' generates dictionaries that conform to the schema of the table ?
Found the issue,
The dictionaries should confirm to the schema, and should have None
in the absent attributes.
That was the mistake I was making, I wasn't creating entries for None
values.
Thanks for your help!