datacontract/datacontract-cli

Import: No support of AWS Athena (Trino) DDLs

roykoand opened this issue · 4 comments

It's not an issue of this project but of the underlying dependency - simple_ddl_parser (https://github.com/xnuinside/simple-ddl-parser)

It does not have support of DDLs generated by AWS Athena (SHOW CREATE TABLE).

Using this DDL as an example:

CREATE EXTERNAL TABLE `database`.`table` (
    column1 string,
    column2 string
)
PARTITIONED BY
(
    column3 integer
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://somewhere-in-s3/prefix1'
TBLPROPERTIES (
  'parquet.compression'='GZIP'
)
$ datacontract import --format sql --source aws_athena_ddl.sql
...
DDLParserError: Unknown symbol "'"

If you delete everything except columns definitions, it's still providing an invalid output:

CREATE EXTERNAL TABLE `database`.`table` (
    column1 string,
    column2 string
)
PARTITIONED BY
(
    column3 integer
)
$ datacontract import --format sql --source aws_athena_ddl.sql
dataContractSpecification: 0.9.3
id: my-data-contract-id
info:
  title: My Data Contract
  version: 0.0.1
models:
  '`table`':
    type: table
    fields:
      column1:
        type: string
      column2:
        type: string

Thanks for reporting.
I think best way is to open n issue (and maybe even PR) at simple_ddl_parser

roykoand could you do so?

@jochenchrist Sure! Just created a feature request in their repo: xnuinside/simple-ddl-parser#272

fyi: was fixed in version 1.6.0 in simple-ddl-parser

Merged #372

@roykoand
Could you test with the current main version, if this solves your issue?