macek/jquery-serialize-object

Input name validation RegEx makes some pretty rigid assumptions

Biont opened this issue · 8 comments

Biont commented

I was trying to figure out why serializeObject() always returned an empty object when trying to serialize my form data.

It turned out to be an issue with the name validation which works with the following regular expression :

/^[a-z_][a-z0-9_]*(?:\[(?:\d*|[a-z0-9_]+)\])*$/i

My input naming scheme is the following: _foo[bar\\baz]. This causes the validation to fail, resulting in an empty object. The reason for that is the use of backslashes.

Now, I know using backslashes could be considered nonstandard and I might think about replacing them, but I can't help but wonder why this RegEx is written the way it is to begin with. I tried to find some HTML specs on valid input names and they don't seem to correspond with that expression.

According to HTML4 spec, ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

You're allowing an underscore as the first character, which would be invalid in HTML4. At the same time, the underscore is the only non-letter character allowed, which is too rigid

HTML5 seems to be free-for-all: Any non-empty value for name is allowed, but the names "charset" and "isindex" are special, so even my backslashes should be fine.

My conclusion is that you are using the regular expression to ensure that the JS object syntax can still be used (-> obj.attr as opposed to obj['attr'], which would allow symbols other than underscores), which -to me- should have a far lower priority than ensuring the form data is serialized entirely. In fact, I don't think peculiarities of JavaScript syntax should have any say in what and how form data is handled as long as the form data itself is valid HTML

I'd like to hear your thoughts on this. Do you think dropping support for JS object syntax is justifyable in favor of proper input name support? Should there be an argument that permits symbols when serializing data?

As an ugly temporary workaround, I have added backslashes to the RegEx patterns so that i can continue developing for now. Thank you for creating this library.

macek commented

@Biont,

Thanks for your excellent post.

My conclusion is that you are using the regular expression to ensure that the JS object syntax can still be used (-> obj.attr as opposed to obj['attr'], which would allow symbols other than underscores), which -to me- should have a far lower priority than ensuring the form data is serialized entirely.

Well put, and you reverse-engineered the original intent perfectly. When I originally wrote this, I didn't see the broader use case of the utility and got sucked into the mindset of writing a "sensible" default behavior.

Since, I've learned about the initiatives of the HTML JSON form submission spec. It also suggests that the priority should be ensuring that the form data is serialized entirely.

I haven't had a lot of time to continue work on this plugin and that spec is sadly no longer maintained, though I believe that spec to be a very good goal for this plugin still.

You seem like you have a good head on your shoulders and I'd appreciate any help you could offer on getting this plugin some of the critical updates it's been needing for a while.

Biont commented

Hey @macek

Glad to hear we're on the same page on this.

Since, I've learned about the initiatives of the HTML JSON form submission spec. It also suggests that the priority should be ensuring that the form data is serialized entirely.

That was an interesting read. I was not aware of that project and I agree that it would be nice to have a JS lib that would allow you to do just that. Dealing with nested data structures in HTML forms is far more tedious than it needs to be.
But it's another step away from giving you the same data you'd receive from a regular form POST submission, which I would imagine is what most people are looking for. Implementing the naming syntax proposed in your link would mean you could only ever handle forms with your plugin, because most data would be lost in a traditional POST submit.
It would certainly be pretty darn amazing if your plugin could do JSON form submission, but I would personally like to see full support for regular HTML form data first.

You seem like you have a good head on your shoulders and I'd appreciate any help you could offer on getting this plugin some of the critical updates it's been needing for a while.

Thanks. I am going to add a fork of your repo to my project, so I can shoot you a PR if I come up with something I'd consider useful. Incidentally, I am writing a form builder plugin, so your plugin will probably be pretty handy.

So what do you think should happen now? Is there any harm in not enforcing the JS object syntax in future versions? I am tempted to just strip out most of the constraints in the regex, since I cannot see a good reason to keep it in. I have noticed some oddities with nesting (and nested array keys) when using symbols, so the regex patterns for those will probably need some testing as well

Biont commented

Oh well, I haven't noticed that you're allowing custom regex patterns through your API, so I could probably do anything I need without even touching your code. I should have scrolled down your readme a bit longer, I guess. Are you even interested in changing the default behaviour with that in mind?

macek commented

Oh well, I haven't noticed that you're allowing custom regex patterns through your API, so I could probably do anything I need without even touching your code.

Correct. However, this API was added after some people were unhappy with the default behavior. It should've been an indication that there was a more serious issue to deal with. Hacking complex regexp patterns is not very friendly and shouldn't be a burden of the common user.

Are you even interested in changing the default behaviour with that in mind?

With 3.x I did intend on changing the default behavior of the plugin. And on that note...

Is there any harm in not enforcing the JS object syntax in future versions? I am tempted to just strip out most of the constraints in the regex, since I cannot see a good reason to keep it in.

I still intended (my mind is open at this point) for this plugin to support building a nested structure, but I would prefer that data the 2.x branch finds "invalid" to somehow be represented in the 3.x output. No data should be lost.

I have noticed some oddities with nesting (and nested array keys) when using symbols, so the regex patterns for those will probably need some testing as well

I was tinkering with a set of regexp last night but the testing suite showed a lot of failures that proved this issue is not easily solved by lone manipulation of the internal regexps. Some other internals will have to be reworked and some tests might get thrown away as they were written to enforce old behavior.

So what do you think should happen now?

To get us on the same page, I think you should show me an example HTML form and the JS(ON) representation of it. What features do you think the plugin should support?

  • can the user build a nested data object ?
  • can the user build arrays with specified indexes ?
  • can the user submit multiple values for the same key ?
  • can the user specify a value's data type ?
    (instead of only { (string) <key> : (string) <value> })
  • other ideas ?

Lateral idea: Do we separate concerns within this project ? I can see two distinct concerns.

  1. Get all Input-type elements (<input>, <select>, <textarea>, etc) and reduce their name:value pairs into a flat JS object where each Input's data is represented in the output.

    This behaviour alone could be useful to someone that just wants all the name:value pairs generated from their inputs. Tree structure may not be required by this user.

    (This is pretty close to what $.searializeArray is doing.)

  2. Parse/Reduce keys in the flat object into a Tree structure (per our new spec).

    This behavior would be useful to someone that needs the data in tree/nested form.

macek commented

Removing the "compliance" tag to keep this open to any ideas. I don't want you to feel like our discussion is already cornered with this label. I'm totally open to discussing any possible future for this plugin.

And as it stands, there really is nothing for us to comply to as there is no official standard.

Biont commented

And as it stands, there really is nothing for us to comply to as there is no official standard.

Yes, as long as there is no specific standard for JSON serialization of HTML forms, the HTML spec for forms is the closest we're going to get:

HTML4 & HTML5

Don't get me wrong, though. I'm not saying those should be followed to a tee. But I think it is not too far-fetched to assume that your plugin is expected to be usable for AJAX form submission, especially in cases when jQuery's builtin tools don't provide enough flexibility.

I'm certainly not against adding sugar on top, but the base requirement should be ensuring that the generated data is identical to what you'd get from submitting the selected inputs traditionally.

So to comment on the feature ideas:

  • can the user build a nested data object ?

Yep

  • can the user build arrays with specified indexes ?

Yep

  • can the user submit multiple values for the same key ?

No, at least not at the most basic level of functionality. HTML does not do this, so neither should we. But there might be a "JSON mode" that does this

  • can the user specify a value's data type?

No, unless the mentioned enhanced mode is used

  • other ideas?

So what about using data attributes to trigger enhanced options of the plugin.

<form data-json-enabled>
    <input type="number" name="foo" value="123" data-type="number">
    <input type="checkbox" name="bar" value="1" data-type="bool">
</form>

To be perfectly honest, though, I think there's limited use from the data type stuff. Number and Boolean are the only really interesting types supported by JSON and those might even be deduced by the input elements and their values - without any further configuration.

A form attribute triggering the use of extended functionality like automatic Array creation for duplicate keys sounds pretty interesting though. It would also draw a direct connection between the HTML and the JS, meaning that we can be quite sure that the user _knows_ that magic stuff happens when he uses your plugin with that specific form.

For reference, here's the custom regex I used to get my nested input names full of symbols working. Nothing special, just putting it here for safe-keeping:

    $.extend(FormSerializer.patterns, {
        validate: /^.*(?:\[(?:\d*|.+)\])*$/i,
        key: /[^\[\]]+|(?=\[\])/gi,
        named: /^.+$/i
    });

Since you asked, I extracted an example of my inputs:

<input
        name="_objects[children][1][children][0][components][0][Foo\\Bar\\Baz][name]" type="text"
        value="Name" placeholder="dummy_name">
<input type="hidden"
       name="_objects[children][1][children][0][components][0][Foo\\Bar\\Baz][__id]"
       value="56be3f6c26130">
<input
        name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][options][]"
        type="text" placeholder="Default value" value="Foo">
<input
        name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][options][]"
        type="text" placeholder="Default value" value="Bar">
<input
        name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][options][]"
        type="text" placeholder="Default value" value="Baz">
<input
        name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][name]"
        type="text">
<input
        name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][label]"
        type="text" value="Dropdown:">
<input
        name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][placeholder]"
        type="text" value="My Input" placeholder="My Input">
<input type="hidden"
       name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][active]"
       value="false">
<input
        name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][active]"
        type="checkbox" value="true">
<input type="hidden"
       name="_objects[children][1][children][0][components][1][Foo\\Bar\\Baz\\Inputs\\Dropdown][__id]"
       value="56be3f6c26e20">
<select>
    <option>Foo</option>
    <option>Bar</option>
    <option>Baz</option>
</select>

As you can see, it makes heavy use of nesting and uses namespaces in array keys (nothing of this will ever see the frontend, no worries :D ).
Your current plugin together with the replaces regular expressions seems to handle this decently

macek commented

@Biont sorry to leave you hanging. Some stuff came up and I've barely had time at my desk. I'll follow up here soon.

macek commented

@Biont I'm happy with your write-up and agree with the decisions you're making. I like that the defaults would be very sensible/basic and any extended features would be enabled by running the serializer with specific options.

As for the data types, I also agree it's not remarkably interesting, but I'd still like it. I'd like to avoid type inference where

 <input name="username" value="1234">

would be serialized as {username: 1234} instead of {username: "1234"}. In other words, I think everything should be a string unless the input has a specific type or data-type attribute set on the field.

If this isn't part of the default behavior, it could be passed as a typeEncoder option

// rough idea for typeEncoder option
$('#myForm').serializeObject({
  typeEncoder: (input, value) => {
    switch (input.data('type') || input.attr('type') || 'text') {
      case 'number': return parseInt(value, 10);
      case 'boolean': return value === '1' or value === 'true';
      default: return value;
    }
  });

As for duplicate keys, a similar technique could be used with a mergeDuplicateKey option

// rough idea for mergeDuplicateKey option
$('#myForm').serializeObject({
  mergeDuplicateKey: (existingValue, newValue) => (
    Array.isArray(existingValue)
      ? existingValue.concat(newValue)
      : [existingValue, newValue]
  )
});

I'm open to implementing these however you want, but at first glance, I like that it lets the end user decide how these things are handled instead of us forcing a convention on them.

Thoughts ?