pacman82/odbc2parquet

setup types for particular column

andreypanchenko opened this issue · 2 comments

how to say to odbctoparquet with SQL or with some other flag which particular type of column in parquet do I want to set?

example this is my query for replicating the data

SELECT
    CAST(offer_id AS CHAR) 				 AS offer_id,
    CAST(clinic_id AS SIGNED) 			 AS clinic_id,
    CAST(operation_id AS SIGNED) 		 AS operation_id,
    CAST(illness_id AS SIGNED) 			 AS illness_id,
    CAST(diagnostic_id AS SIGNED) 	         AS diagnostic_id,
    CAST(score AS CHAR) 				 AS score,
    CAST(created_at AS DATETIME) 		 AS created_at,
    CAST(type AS CHAR) 				 AS type,
    CAST(clinic AS CHAR) 				AS clinic,
    CAST(operation AS CHAR) 			AS operation,
    CAST(illness AS CHAR) 				AS illness,
    CAST(diagnostic AS CHAR) 			AS diagnostic,
    CAST(tag_new AS SIGNED) 			AS tag_new,
    CAST(tags AS CHAR) 				AS tags,
    CAST(views AS SIGNED) 				AS views,
    CAST(clicks AS SIGNED) 				AS clicks,
    CAST(leads AS SIGNED) 				AS leads,
    CAST(conversion_rate AS DECIMAL)        AS conversion_rate,
    CAST(ctr AS DECIMAL) 				AS ctr,
    CAST(ctr_weight AS DECIMAL) 		AS ctr_weight
FROM offers_clinics

then in the parquet viewer in Pycharm, I see the next (using parquet/avro viewer)

  "type" : "record",
  "name" : "schema",
  "fields" : [ {
    "name" : "offer_id",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "clinic_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "operation_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "illness_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "diagnostic_id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "score",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "created_at",
    "type" : [ "null", {
      "type" : "long",
      "logicalType" : "timestamp-millis"
    } ],
    "default" : null
  }, {
    "name" : "type",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "clinic",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "operation",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "illness",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "diagnostic",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "tag_new",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "tags",
    "type" : [ "null", "string" ],
    "default" : null
  }, {
    "name" : "views",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "clicks",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "leads",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "conversion_rate",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "ctr",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "ctr_weight",
    "type" : [ "null", "long" ],
    "default" : null
  } ]
}

I'm expecting these three columns in DOUBLE or NUMERIC instead of INTEGER or LONG
maybe you can point me to the place in doc or maybe you have some work arounds.
Thanks in advance

version of binary odbc2parquet 0.9.5
Database MySql

Hello @andreypanchenko ,

in general odbc2parquet decides the best fitting parquet types itself based on the column types of the query it gets reported by (the ODBC driver of ) your data source. The mapping from relational type (as reported by ODBC) to logical parquet type is documented in the Readme.

Thanks for the information you provided. Sadly I do use neither avro nor pycharm and I do not know how to interpret the output. I am especially unsure wether types reported are physical or logical.
Luckily there should be an easy way to help me understand your problem better: Please run odbc2parquet with -vv to enable more verbose logging and share the output with me.

Please update to the newest version of odbc2parquet. I can not provide support for old versions, nor do I remember their specific capabilities.

Best, Markus

Closing this as it is not actionable at the moment for me. Without verbose output of odbc2parquet I can not tell what type ODBC reported.

Feel free to reopen the issue if the information can be provided.