proposal: encoding/json: reject unknown fields in Decoder
Closed this issue ยท 36 comments
Currently, json.Unmarshal (and json.Decoder.Decode) ignore fields in the incoming JSON which are absent in a target struct.
For example, the JSON { "A": true, "B": 123 } will successfully unmarshal into struct { A bool }.
This makes it difficult to do strict JSON parsing. One place this is an issue is in API creation, since you want to reject unknown fields on incoming JSON requests to allow safe introduction of new fields in the future (I've had this use case previously when working on a JSON REST API). It also can lead to typos leading to bugs in production: when a field is almost always the zero value in incoming JSON, you might not realise you're not even reading a misspelled field.
I propose that a new method is added to Decoder to turn on this stricter parsing, in much the same way UseNumber is used today. When in strict parsing mode, a key in the incoming JSON which cannot be applied to a struct will result in an MissingFieldError error. Like UnmarshalTypeError, decoding would continue for the remaining incoming data.
d := json.NewDecoder(r.Body)
d.UseStrictFields()
err := d.Decode(myStruct)
Could we add a mechanism to the json package that allows a struct to declare that it should have strict or non-strict handling, independent of how the decoder is configured? This could be an interface, or a pair of zero-size types (or a bool type) that could be embedded -- in much the same way that the xml package provides configurability -- or perhaps a json struct tag that could be recognized on any padding field.
There are cases where it's desirable for some json objects in a "document" to be treated strictly, whereas others in the same document should be non-strict (yet still warrant use of a struct instead of a map). Allowing an "overflow" map field would make it easier to express this (if you include such a map field, and turn on strict handling, the containing struct would be non-strict).
This is a duplicate of #14750. Continuing @extemporalgenome's thought, I wonder if the strictness should always be in the tag instead of a new method. Per-struct is difficult but per-field is fine. You could imagine json:"Foo,exactname".
Ah, thanks very much. OK, then UseStrictFields is definitely a bad name. :-)
We could certainly solve Dave's problem but I wonder if we should do something a bit more generic. In the XML decoder there is a ",any" tag that you can use to say "unexpected things go here". I wonder if JSON should allow a map[K]V (K=string, V=interface{} would be common but not strictly required) to have a tag json:",any" to collect otherwise ignored key-value pairs. Then the decoder can do whatever it needs to do with them. Similarly on encoding those would have to be reinserted into the marshalled object.
It might be too late for Go 1.7 to work all this out.
The "overflow" map discussed by @extemporalgenome and @rsc has been requested a few times. In fact, #6213 already exists for such a proposal. (The confusing name "inline" for the feature makes it hard to discover.) Perhaps this bug should supersede #6213.
I like the overflow map as a solution to the extra keys problem.
I think the dual of this issue is also important: sometimes you want to know that a key was missing from the JSON object. (In the case of a misspelling as in @okdave's motivation, you could detect either the missing key or the extra key.)
Adding a way to error on missing JSON keys is tracked by #6901. I believe these should be considered at the same time.
An overflow map would work for this purpose. One downside to adding that as a struct field would be that for deeply nested structures, you'd need to add the field to multiple structs and do a lot of manual bookkeeping after unmarshalling. One big advantage is no (terribly named) new method on Decoder and that it works with Unmarshal too.
Would that same map be used to add additional keys during Marshal? Or ignored there?
@okdave users would likely expect any Marshal(Unmarshal()) to produce output equivalent to the input.
Yup, I agree. @rsc I wasn't aiming for 1.7, but I'm happy to jump on this if there's consensus on this approach.
I need this as well. When parsing configs any unknown field is either a typo, or an old version, or something equally bad.
unknown map[string]interface{} // `json:",any"`
would work for me.
On second thought, any sink will work poorly with nested structs. One would need to add the sink to all structs and then manually check all of them. Obviously error prone.
I'm +1 on providing a way to do this.
Perhaps something like this:
// DisallowUnknownFields causes the Decoder to return an error
// if it tries to decode an object member into a struct that doesn't have
// a field with that member's name.
func (dec *Decoder) DisallowUnknownFields()
The implementation is trivial - I can propose it if this proposal is agreed on.
I see this as orthogonal to the UseStrictNames suggestion from #14750.
+1 for DisallowUnknownFields(). The interface is similar to the existing UseNumber() method.
I have a patch that implements this feature. It is passing nearly all of the existing encoding/json unit tests except one test is failing. Wasn't sure how to get feedback on my patch? Do I submit it to gerrit even though it needs additional work? I've created a pull request from my fork against my fork so that the changes can be browsed: https://github.com/mspiegel/go/pull/1/files.
The failing test is related to a key that is being "annihilated", that's the terminology in the source code when two fields in a struct have the same JSON name.
If you have a change that's ready for people to provide feedback on, gerrit's the right place to have the discussion. If there's one thing you can't figure out, then just check the rest of the change is sane and send it out.
You should add a comment to the change once you've mailed it to indicate what help you still need, and add "DO NOT SUBMIT" on the second line of the commit message to indicate that the CL is currently unworkable. Per @rsc do not submit means "something you do want people to see and review but that you know is not ready".
Let me know if you need more info.
CL https://golang.org/cl/27231 mentions this issue.
Thanks for the initial feedback on gerrit it has been helpful. Can I convince a consensus to change the name of the new option from DisallowUnknownFields to DisallowUnusedKeys? The motivation to change 'Fields' to 'Keys' is that the new option generates an error based on ignored JSON keys not based on ignored struct fields. The motivation to change 'Unknown' to 'Unused' is more subjective: Unused is a simple definition of the JSON keys that generate an error. Unknown JSON keys are defined as those keys which are unused.
I prefer DisallowUnknownFields. For starters Unused means something subtly different to unknown: we don't really know whether someone is using a key or not. I think field is fine too: the target is a struct, and the field is unknown/missing on that struct.
OK works for me. I'll move forward on the patch.
What happens next? The patch is complete and submitted to gerrit https://golang.org/cl/27231. It's waiting further review.
@mspiegel, this is still in the proposal phase. We're making progress on the backlog of proposals and will get to this one soon. If approved, we'll continue more on the code review, where it looks like some naming & tests might need tweaking.
Any progress on this proposal? It'd be great if this feature were included in the next release.
I'm going to look at all the json and xml stuff in the next couple weeks.
Any news on the status of the patch?
Not sure why this was still in proposal mode. I think everyone agrees this can be added. Will move to Go1.9Early milestone. We missed Go1.8, sorry.
I like DisallowUnknownFields when you want to have strict validation. It keeps the structs clean. However, I would also like to see a json:",any" option for cases where you want to collect fields not defined in the struct. In my case I have an object that has some well known fields, but I also want to keep user-defined data that is not well known.
IMO, the two cases are closely related and I would like to see them both addressed at the same time.
Given that map[string]interface{} is used if you were to Unmarshal an object into interface{}, wouldn't requiring an map[string]interface{} type for ,any be sufficient? The purpose is to save the data instead of dropping it.
Even if you could support other types such as map[string]string, I'm not sure handling "what do you do with a non-string value" is a necessary pursuit.
What if ,any required map[string]interface{} and other types are revisited at a later time?
There could be a special type json.Meta (which is a map[string]interface) that would get embedded and json would treat it properly:
type Data struct {
Known int `json:"known"`
Parts []string `json:"parts"`
Go string `json:"go"`
Here string `json:"here"`
json.Meta // rest would get marshal/unmarshal to this
}
It would be nice if this was possible for json.RawMessage too.
Change https://golang.org/cl/74830 mentions this issue: encoding/json: disallow unknown fields in Decoder