jeroen/jsonlite

fromJSON load list of heterogeneous data types incorrectly

fzhcary opened this issue · 3 comments

Here is an example. the nested list contains data of different types,

rm(list = ls())
library(jsonlite)
input_str='{
  "columns": [
    "a",
    "b",
    "c"
  ],
  "data": [
    [
      0,
      "NO ACTION/LOGIN",
      -0.511300240726737
    ],
    [
      0,
      "Not_Preferred",
      1.2
    ]
  ]
}'


json <- jsonlite::fromJSON(input_str, simplifyVector = TRUE)
df <- data.frame(json$data, row.names = json$index)
str(df)

This will produce the following result,

'data.frame':	2 obs. of  3 variables:
 $ X1: chr  "0" "0"
 $ X2: chr  "NO ACTION/LOGIN" "Not_Preferred"
 $ X3: chr  "-0.511300240726737" "1.2"

It coerced all data into char types.
You may argue that this is a desired result by using simplifyVector = TRUE.
However, if you specify simplifyVector = FALSE,
the result is even worse, it automatically flattened the two lists !

'data.frame':	1 obs. of  6 variables:
 $ X0L                : int 0
 $ X.NO.ACTION.LOGIN. : chr "NO ACTION/LOGIN"
 $ X.0.511300240726737: num -0.511
 $ X0L.1              : int 0
 $ X.Not_Preferred.   : chr "Not_Preferred"
 $ X1.2               : num 1.2

What I am looking for is to preserve the data types, for example,
By using reticulate and pandas , at least I can get this

install.packages("reticulate")
library(reticulate)
pd <- import("pandas")
json <- pd$read_json(input_str, orient="split")
df <- data.frame(json)
str(df)

result

'data.frame':	2 obs. of  3 variables:
 $ a: num  0 0
 $ b: chr  "NO ACTION/LOGIN" "Not_Preferred"
 $ c: num  -0.511 1.2

it merged int and num into num, but doesn't mix num with char.

Can this be fixed?

Thanks!

I don't quite understand your question. The problem in your example code is not in jsonlite, but how you create a data frame. What you're trying to do does not workin R.

Maybe this code does what you want?

input <- jsonlite::fromJSON(input_str)
df <- as.data.frame(input$data)
names(df) <- input$columns
df$a <- as.integer(df$a)
df$c <- as.numeric(df$c)
df

If you want jsonlite to automatically create a data frame you need to encode your json like so:

str <- '[
  {
    "a": 0,
    "b": "NO ACTION/LOGIN",
    "c": -0.5113
  },
  {
    "a": 0,
    "b": "Not_Preferred",
    "c": 1.2
  }
]'

jsonlite::fromJSON(str)

I don't know how but I'd like to request this ticket to be reopened.

Let's explain the issue again,
the issue is parse the json str to json data, the data.frame(...) step is just to illustrate the problem, you can ignore that.

so from the above example, parsed json structure is this,

     [,1] [,2]              [,3]                
[1,] "0"  "NO ACTION/LOGIN" "-0.511300240726737"
[2,] "0"  "Not_Preferred"   "1.2"  

jsonlite fromJSON method automatically enclose 0 with quotes "". This is not what the user want.

The suggestion you made in the end also won't work, since you won't know the data types before hand. It can be a mixture of str, int and double, etc.

Please read my example using reticulate/pandas and compare the jsonlite result.

Vectors in R must be homogenous. Because your data is stored as mixed-typed arrays, they get coerced to strings. You can disable the automatic simplification to manually create the data frame like so:

input <- jsonlite::fromJSON(input_str, simplifyVector =  FALSE)
columns <- unlist(input$columns)
out <- lapply(input$data, function(x){structure(x, names = columns)})
df <- jsonlite:::simplify(out)