GoogleCloudPlatform/DataflowPythonSDK

Unable to rename output files from gs://xxx

ericescalante opened this issue · 2 comments

Hi, using both 0.6 and 2.0 versions of the python SDK, we have an issue when merging to collections in order to write to BQ, the job cancels with:

Unable to rename output files from gs://xxx/temp/yyy.1496310068.567790/dax-tmp-2017-06-01_02_41_20-1322936065210643157-S01-1-ce1f668822da1a2e/tmp-ce1f668822da14bb@DAX.avro to gs://xxx/temp/yyy.1496310068.567790/tmp-ce1f668822da14bb@*.avro.

If run them separately, the job ends successfully. Any helpful pointers greatly appreciated! :)

Hi @ericescalante, what do you mean by running them separately? Could your job a bit more?

cc: @chamikaramj

Thanks for replying @aaltay, I found the issue.
One of the two of the FlatMap functions was failing because of badly formatted data. But I could only find that out debugging every little bit with DirectRunner first, the error message was a bit too cryptic ;)