This is a command line utility to generate CREATE TABLE
statements compatible
with Trino from a table schema provided as JSON
Schema.
-
Grab the script.
$ # Or download just the Python script :) $ git clone https://github.com/hashhar/jsonschema2sql.git
-
Run it against a schema (test.json is a sample schema present in the root of this repo).
$ python3 jsonschema2sql.py --jsonschema test.json \ --table my_test_table \ --schema my_test_db \ --location 's3://my/test/bucket' \ --table-format 'JSON' \ --partition-columns year month date
This generates a
CREATE TABLE
DDL as belowCREATE TABLE "my_test_db"."my_test_table" ( "string_col" varchar, "datetime_col" timestamp, "datetime_string_col" varchar, "date_col" date, "date_string_col" varchar, "time_col" time, "time_string_col" varchar, "decimal_string_col" decimal(10, 2), "double_col" double, "double_double_col" double, "float_col" float, "decimal_col" decimal(5, 3), "action_date" bigint, "boolean_col" boolean, "array_col" array(varchar), "array_object_col" array(ROW("string_col" varchar, "datetime_col" timestamp)), "object_col" ROW("string_col" varchar, "integer_col" bigint) ) WITH ( external_location = 's3://my/test/bucket', partitioned_by = ARRAY['year', 'month', 'date'], format = 'JSON' )
For all the possible command line arguments, pass the -h
or --help
command
line argument.
The JSON Schema to SQL types mapping is defined in code via a mapping of the
JSON Schema type
and format
pair against the corresponding SQL type.
For the complete mapping refer to the JSON_TYPE_TO_SQL_TYPE
dictionary in the
script.
I welcome you to use this tool (without any implied support contract :stuck_out_tongue:). In case you find any issues feel free to create one on GitHub.
For feature requests keep the following expectations in mind (I am open to being persuaded otherwise):
- The project aims to only generate DDL for
CREATE TABLE
statements. - The project doesn't aim to cover entirety of JSON Schema.
- The
precision
,scale
and some values for theformat
are extensions to the JSON Schema and not part of the standard JSON Schema and as such they might not work with other tools in the JSON Schema landscape. - The target system for the DDL is expected to be Trino and while the generated SQL might work on other systems they are explicitly unsupported (help welcome to implement dialects support).
For contributing changes I recommend to open an issue before implementing anything. That makes it more likely to avoid repeated effort and allows other people to see if somebody else is already working on an issue or not.
- Fork the repo.
- Create a new branch from
master
in your fork. - Do your changes.
- Run
tools/reformat.sh
andtools/build.sh
(this might change the files for reformatting). - Commit your changes and open a pull-request.
Join us on Trino slack for more about
Trino. I can be found as hashhar
over there.
Create an issue in this repo or drop me an email (not that hard to find).