interagent/pliny

Gem "oj" 3.0 breaks "encode"

fdr opened this issue · 12 comments

fdr commented

I've noticed a fairly large behavioral change in oj that breaks a number of old once-working code. It seems to take place in the changes for 3.0.

Among the causalities is encode, even when passed a hash in the normal way in the serializers. I am using the newest released pliny.

For example, it used to be that Sequel models would be automatically serialized as their values. Now the to_s representation gets serialized (so, sequel_model_object.values is the fix).

I suspect there is a similar assumption somewhere, e.g. maybe reliance on to_h or to_hash, that is no longer advised.

Would it be correct to say that the root of the problem probably has something to do with previously implicit serialization of Sequel models?

fdr commented

I don't think so, because my broken encode expressions involve hashes.

fdr commented

Although, maybe.

fdr commented

No, I think you are right. I thought I caught one where it was just a hash, but nope. It's probably all root-caused by serialization of Sequel models.

Ah nice. Yeah, I only ask because it would be surprising if oj broke serialization of hashes containing only arrays/hashes and primitive types — that would have to be considered a big bug I think. Breaking undefined behavior around encoding more complex objects would be totally plausible though.

fdr commented
fdr commented

Took a while, but here it is. ohler55/oj#446

Nice!

I'll leave it up to the maintainers over there to do what they will, but general comment: it's probably not a good idea to rely upon undefined behavior in a library like oj — serializing a complex object for example. You and I both know that serializing a JSONBHash as if it were just a normal hash is a pretty sensible thing to do, but it would be a boundaries violation for oj to understand that.

A better answer than wait on a fix in OJ might be to reimplement your code so that either (1) any non-standard objects are cast to something that is standard before calling out to Oj, or (2) a check is made on serialize that there are no non-standard objects in the data and an exception raised otherwise — this would force the implementer of any particular endpoint to explicitly do the casting for themselves.

fdr commented

It's not really undefined: with use_to_json set, it should call that symbol.

puts Sequel::Postgres::JSONBHash.new({hello: 'world'}).to_json
{"hello":"world"}

Ah, cool. Well you are still somewhat reliant in that case of JSONBHash properly implementing #to_json or a monkey patch to provide your own implementation. As long as you know what you are doing though :)

fdr commented

Well, it delegates to_hash. As it turns out, it was neither of these things, but a difference in what compat means entirely that MultiJSON doesn't understand. ohler55/oj#446 intridea/multi_json#181

Just a note for future readers:

My team has been seeing similar issues when upgrading to oj version 3 or greater. oj's :compat mode (which is the mode multi_json uses for oj) was changed substantially in version 3, where it prefers #to_json rather than #to_hash to serialize objects. We ended up solving it by configuring multi_json to use a :custom mode for oj that prefers #to_hash like this:

MultiJson.dump_options = {
  mode: :custom,
  use_to_hash: true,
  use_to_json: false
}