qri-io/dataset

bug: schema detection on CSV, empty entries are assumed to be strings

Opened this issue · 1 comments

EG:

Name,Home_Runs,Rank
Barry Bonds,762,1
Hank Aaron,755,2
Babe Ruth,714,
Alex Rodriguez,696,4
Willie Mays,660,5

has schema:

"schema": {
  "items": {
    "items": [
      {
        "title": "name",
        "type": "string"
      },
      {
        "title": "home_runs",
        "type": "integer"
      },
      {
        "title": "rank",
        "type": "string"
      }
    ],
    "type": "array"
  },
  "type": "array"
}

Instead of

{
   "title": "rank",
  "type": "string"
}

Propose that in ParseType checks for empty byte slice. If empty we say type is TypeEmpty.

If a row has multiple types, but one of those are TypeEmpty, we disregard those when we determine the schema.

Except then what value do we give this field when we marshal into go????