Allow option to pass in iterable of dictionaries when used as a library

Question

Allow option to pass in iterable of dictionaries when used as a library

abroglesc opened this issue 5 years ago · 3 comments

Right now if I wanted to generate the schema of a list of dictionaries I would need to first convert each dictionary into a JSON string just so that it could be loaded back into a dictionary and yielded in the json_reader. When using this as a library this would be a useful feature.

I am happy to create a PR for this if you would like but wanted to propose and make sure you are onboard with it before sending the PR @bxparks

Let me know if I should continue with a PR that includes some added tests for it.

Answer 1 · 2020-04-29T20:40:09.000Z

Hi,

You mean you have already read in the data into an iterable of Python dict, and you want to send that data into the SchemaGenerator class to produce the BigTable schema? I think that would be a useful feature.

The entry point is SchemaGenerator.deduce_schema(file), which currently wraps a reader (CSV or JSON) around the file and continues. I think what you want is to wrap a null reader... or actually just pass the file onwards without wrapping it.

How about we add a input_format=='dict' option, then add something like the following at the top of the deduce_schema() method:

if self.input_format == 'csv':
  ...
elif self.input_format == 'dict':
  reader = file
else:
  raise ...

By setting the reader to be equal to file, I think the rest of code would just work, because the json_object in that code is really just a Python dict.

(At some point, we should probably rename the file variable to something else, like input_data.)

Answer 2 · 2020-04-29T20:42:09.000Z

That is exactly what I want to do. I can implement it as such.

Answer 3 · 2020-06-06T00:08:18.000Z

I'm going to close this due to lack of activity. If you get the time to send me a PR, you can re-open this.