Error when parsing data with content after serialized output

Question

Error when parsing data with content after serialized output

ojizero opened this issue 4 years ago · 6 comments

The PHP un-serialization behaviour seems to accept any amount of (potentially invalid data) that may be present after the actual serialized string, for example it treats both i:10; and i:10;something totally irrelevant exactly the same.

I've been looking up to find out if this behaviour is a bug or something unexpected but I couldn't find any information regarding it!

You can check here online, or in the PHP REPL.

But the behaviour implemented here returns an error tuple if any non empty string remains after the parsing is done,, mismatching what PHP does under the hood!

I personally think the most proper way of handling this would be to return a tuple of parsed data with remaining string if any, similar to how Integer.parse/1 behaviour is implemented for example, and letting the consumer decide how to work with the results, but I wanted to take any feedback, suggestions, or ideas!

Optionally it could be kept as an external option to be given, that skips return an error tuple if rest day exists, as to avoid any breaking changes here!

Honestly I'm unsure if this project is still active or not, and I'll be willing to implement a fix of the behaviour (as I will be doing anyway since we need it in our current situation).

Answer 1 · 2020-11-04T20:32:05.000Z

Yes, you are right.
I like your offer with an additional option. I would be grateful for your Pull Request.

I think that the check for new option should be put here
https://github.com/zloyrusskiy/php_serializer/blob/master/lib/php_serializer.ex#L91

Answer 2 · 2020-11-05T08:17:35.000Z

@zloyrusskiy thanks for your response dude ^^ just opened #4 with a fix

Answer 3 · 2020-11-05T08:19:59.000Z

@zloyrusskiy just a thought though, do you think it would be clearer instead of adding an option, to expose a function parse/2 similar to unserialize/2 but that behaves more like Integer.parse/1 builtin ? i.e. it always ether returns a tuple of strings or an error tuple?

eg

{10, ""} = PhpSerializer.parse("i:10;")
{10, "excess"} = PhpSerializer.parse("i:10;excess")

instead of returning a tuple starting with :ok? 🤔 just a thought here about the better API to expose by the library ^^

Answer 4 · 2020-11-05T17:11:07.000Z

The Integer.parse option is not needed :ok because

the error has no details
the type of the first element of the tuple is only integer

Our deserialized result can be of different types + it will be useful to know where the serialization broke. To make it convenient to use pattern matching, we should keep :ok and : error.

I suggest to bump the version of the library => 2.0 and add new breaking changes:

by default, we will not unserialize strictly (like PHP)
and add 2 additional options :strict + :with_excess

examples:

{ :ok, 0 } = PhpSerializer.unserialize("i:0;i:34;")

{ :error, "got excess data" } = PhpSerializer.unserialize("i:0;i:34;", strict: true)

{ :ok, 0, "i:34;" } = PhpSerializer.unserialize("i:0;i:34;", with_excess: true)
{ :error, "got excess data", "i:34;" } = PhpSerializer.unserialize("i:0;i:34;", strict: true, with_excess: true)

What do you think about this?

P.S. we'll keep previous options the same

Answer 5 · 2020-11-08T09:44:50.000Z

hey dude thanks for your response, sorry for the very delayed response I decide to be a potato this weekend 😅

yea I do think doing it that way would be the best way in handling it! will work on it tonight and open a PR either later tonight or tomorrow morning ^_^

Answer 6 · 2020-11-09T08:33:14.000Z

hey @zloyrusskiy just opened #5 with the breaking changes you suggested ^_^