wanglingsong/JsonSurfer

GsonParser converting longs to doubles

Opened this issue · 9 comments

The GsonParser is converting longs to doubles within the numberHolder implementation. Instead of calling return jsonProvider.primitive(jsonReader.nextString()); perhaps something like this would work better:

final String value = jsonReader.nextString();
try {
    return jsonProvider.primitive(Long.parseLong(value));
}
catch (final NumberFormatException e) {
    return jsonProvider.primitive(Double.parseDouble(value));
}

Did you test it? Are sure no any exception would be thrown when calling nextString() following "NUMBER" token?

Yes, using the attached snippet, it worked as expected. Per the JsonReader.nextString javadoc: If the next token is a number, this method will return its string form

I think it would introduce too much overhead for the potential two more parsing. If you really need long type, I think you can implement a custom JsonProvider.

Doing this at the provider level could work, however I believe the core issue is in the parser as that's where its forcing the long into a double via the call to jsonReader.nextDouble(). By the time it gets to the provider its already been turned into a double. The JsonReader class does internally keep track of whether its a long or double, but unfortunately it doesn't make that information available to public consumers. I will check if its possible to extend JsonReader to gain access to the peeked member, which if set to 15 indicates its a long vs a double.

Unfortunately it looks like JsonReader::peeked is package scoped and not protected

Actually, I'm curious about your use case? What kind of benefit can you gain from such conversion?

The use case is that the json we parse and filter needs to retain its original formatting so that when we do schema inference it doesn't change types from a long to a double.

So due to such a limitation of Gson, maybe you can try other JsonSurfer implementation, e.g. JacksonSurfer

Will give it a look, ideally I want an implementation that I can use in a streaming read and provider scenario. As the data is read and filtered with json path, the output is then fed to a provider that is simply streaming out the other side, that way if I hit a massive json document with a json path like $.* it wouldn't blow up trying to assemble the entire document in memory.