GoogleCloudPlatform/DataflowPythonSDK

Handling null values in apache beam

NithinKumaraNT opened this issue · 1 comments

I have a bigquery table that have null values in certain columns which i am reading in the apache beam and writing to parquet file. But i get the following error :

File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/iobase.py", line 1041, in finish_bundle yield WindowedValue(self.writer.close(), File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", line 395, in close self.sink.close(self.temp_handle) File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/parquetio.py", line 449, in close self._flush_buffer() File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/parquetio.py", line 481, in _flush_buffer size = size + b.size AttributeError: 'NoneType' object has no attribute 'size' [while running 'writetoParquet/Write/WriteImpl/WriteBundles']

Please help !

We moved to Apache Beam!

Google Cloud Dataflow for Python is now Apache Beam Python SDK and the code development moved to the Apache Beam repo.

If you want to contribute to the project (please do!) use this Apache Beam contributor's guide. Closing out this issue accordingly.