Allow option to pass in iterable of dictionaries when used as a library
abroglesc opened this issue · 3 comments
Right now if I wanted to generate the schema of a list of dictionaries I would need to first convert each dictionary into a JSON string just so that it could be loaded back into a dictionary and yielded in the json_reader. When using this as a library this would be a useful feature.
I am happy to create a PR for this if you would like but wanted to propose and make sure you are onboard with it before sending the PR @bxparks
Let me know if I should continue with a PR that includes some added tests for it.
Hi,
You mean you have already read in the data into an iterable of Python dict, and you want to send that data into the SchemaGenerator
class to produce the BigTable schema? I think that would be a useful feature.
The entry point is SchemaGenerator.deduce_schema(file)
, which currently wraps a reader (CSV or JSON) around the file
and continues. I think what you want is to wrap a null reader... or actually just pass the file
onwards without wrapping it.
How about we add a input_format=='dict'
option, then add something like the following at the top of the deduce_schema()
method:
if self.input_format == 'csv':
...
elif self.input_format == 'dict':
reader = file
else:
raise ...
By setting the reader
to be equal to file
, I think the rest of code would just work, because the json_object
in that code is really just a Python dict
.
(At some point, we should probably rename the file
variable to something else, like input_data
.)
That is exactly what I want to do. I can implement it as such.
I'm going to close this due to lack of activity. If you get the time to send me a PR, you can re-open this.