bxparks/bigquery-schema-generator

Schema generation from a list

ZiggerZZ opened this issue · 3 comments

Hi,

I think it would be useful (at least for me :)) to add a function to the library that generates a schema from some data, something like

from bigquery_schema_generator import generate_schema
data = [{"first_column": 1, "second_column": "value"}, {"first_column": 2, "second_column": "another value"}]
schema = generate_schema(data)  # returns a list

I came up with this function:

from subprocess import check_output
def generate_schema(data):
    data_string = ""
    for d in data:
        if d:
            data_string = data_string + json.dumps(d) + '\n'
    data_bytes = data_string.encode('utf-8')
    s = check_output(['generate-schema'], input=data_bytes)
    schema = json.loads(s)
    return schema

but I'm sure there's a more efficient way.

This is a duplicate of #47 which I closed due to lack of activity. I give some pointers there for how this can be implemented. If you want to take a crack at it, I recommend waiting until #57 is merged because that's a substantial refactoring.
[Edit: Replaced #40 with more recent #57]

Oh, haven't seen this issue.
I'll wait till the PR is merged.

#40, #57 was replaced with #61 and merged in. Maybe you want to rebase and take another crack at this?