GoogleCloudPlatform/DataflowJavaSDK

Conversion of PCollection<TableSchema> to TableSchema object

kpr6 opened this issue · 1 comments

kpr6 commented

So, I'm trying to read the schema file into an PCollection using TextIO from bucket and make a TableSchema object out of it to feed it to BigQueryIO.Write.withSchema(). Here's bit of code
`final TupleTag<TableSchema> tableSchemaTag = new TupleTag<TableSchema>(){};
final TupleTag<String> schemaStartTag = new TupleTag<String>(){};

PCollectionTuple schemaFields = p.apply(TextIO.Read.named("Reading file schema").from(options.getSchema()).withoutValidation())
                                 .apply(ParDo.named("Generating Table schema").withOutputTags(tableSchemaTag,TupleTagList.of(schemaStartTag)).of(new DoFn<String,TableSchema>(){
                                      @Override
                                      public void processElement(ProcessContext c){
                                        List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
                                        for(String schemaWord:c.element().split(",")){
                                              String schemaKV[] = schemaWord.split(":");
                                              fields.add(new TableFieldSchema().setName(schemaKV[0]).setType(schemaKV[1]));
                                        }
                                        TableSchema schema = new TableSchema().setFields(fields);
                                        c.output(schema);
                                        String firstColumn[] = c.element().split(":",2);
                                        c.sideOutput(schemaStartTag,firstColumn[0]);
                                      }
                                    }));

PCollection<TableSchema> tableSchema = schemaFields.get(tableSchemaTag);`

Now, when I pass this 'tableSchema' to .withSchema(), it gives out this error and i know this is coz it accepts only TableSchema object. Is there any way around this?
incompatible types: com.google.cloud.dataflow.sdk.values.PCollection<com.google.api.services.bigquery.model.TableSchema> cannot be converted to com.google.api.services.bigquery.model.TableSchema