jeroen/jsonlite

The default handling of missingness in dataframes cannot be reached through parameters

Opened this issue · 1 comments

I was surprised to find the default behavior of calling toJSON on a dataframe containing NAs was to drop the headers. After reviewing issues (#223 and others) and the paper (2.4.3), it's clear this is intentional. It's surprising to me then that I cannot reproduce this behavior using toJSON's na parameter.

Consider the following:

library(jsonlite)

# Creating missing data
iris = iris[1:3, ]
iris[1, 1:2] = NA
iris[2, 3:4] = NA

The default behavior drops missing headers.

toJSON(iris, pretty = T)
#> [
#>   {
#>     "Petal.Length": 1.4,
#>     "Petal.Width": 0.2,
#>     "Species": "setosa"
#>   },
#>   {
#>     "Sepal.Length": 4.9,
#>     "Sepal.Width": 3,
#>     "Species": "setosa"
#>   },
#>   {
#>     "Sepal.Length": 4.7,
#>     "Sepal.Width": 3.2,
#>     "Petal.Length": 1.3,
#>     "Petal.Width": 0.2,
#>     "Species": "setosa"
#>   }
#> ]

na = "string" keeps headers, passing the string "NA" to record missingness

toJSON(iris, na = "string", pretty = T)
#> [
#>   {
#>     "Sepal.Length": "NA",
#>     "Sepal.Width": "NA",
#>     "Petal.Length": 1.4,
#>     "Petal.Width": 0.2,
#>     "Species": "setosa"
#>   },
#>   {
#>     "Sepal.Length": 4.9,
#>     "Sepal.Width": 3,
#>     "Petal.Length": "NA",
#>     "Petal.Width": "NA",
#>     "Species": "setosa"
#>   },
#>   {
#>     "Sepal.Length": 4.7,
#>     "Sepal.Width": 3.2,
#>     "Petal.Length": 1.3,
#>     "Petal.Width": 0.2,
#>     "Species": "setosa"
#>   }
#> ]

na = "null" keeps headers, passing the value null to record missingness.

toJSON(iris, na = "null", pretty = T)
#> [
#>   {
#>     "Sepal.Length": null,
#>     "Sepal.Width": null,
#>     "Petal.Length": 1.4,
#>     "Petal.Width": 0.2,
#>     "Species": "setosa"
#>   },
#>   {
#>     "Sepal.Length": 4.9,
#>     "Sepal.Width": 3,
#>     "Petal.Length": null,
#>     "Petal.Width": null,
#>     "Species": "setosa"
#>   },
#>   {
#>     "Sepal.Length": 4.7,
#>     "Sepal.Width": 3.2,
#>     "Petal.Length": 1.3,
#>     "Petal.Width": 0.2,
#>     "Species": "setosa"
#>   }
#> ]

The above is equivalent to toJSON(iris, na = NULL, pretty = T)

These are the three possible values to na, and none of them reproduce the first result when na was not specified.

I would expect the default handling of missingness in toJSON to be reachable through its na parameter.

Any updates / progress on this front? NA-handling is incredibly important.