bug: schema detection on CSV, empty entries are assumed to be strings

Question

bug: schema detection on CSV, empty entries are assumed to be strings

Opened this issue 7 years ago · 1 comments

EG:

Name,Home_Runs,Rank
Barry Bonds,762,1
Hank Aaron,755,2
Babe Ruth,714,
Alex Rodriguez,696,4
Willie Mays,660,5

has schema:

"schema": {
  "items": {
    "items": [
      {
        "title": "name",
        "type": "string"
      },
      {
        "title": "home_runs",
        "type": "integer"
      },
      {
        "title": "rank",
        "type": "string"
      }
    ],
    "type": "array"
  },
  "type": "array"
}

Instead of

{
   "title": "rank",
  "type": "string"
}

Answer 1 · 2018-06-06T20:38:08.000Z

Propose that in ParseType checks for empty byte slice. If empty we say type is TypeEmpty.

If a row has multiple types, but one of those are TypeEmpty, we disregard those when we determine the schema.

Except then what value do we give this field when we marshal into go????