lukad/pears

JSON parsing example fails with string escape chars and numbers

Closed this issue · 1 comments

The JSON parser example has two issues with it that I noticed in the process of making my own gleam library that uses pears for json parsing. The first issue is with the json string parser, which currently starts with:

let str =
    none_of(["\""])
    |> alt(escape)

This never runs the escape parser, because none_of(["\""]) will match backslashes, and escape only matches things starting with backslashes. The correct code should be:

let str =
    none_of(["\"", "\\"])
    |> alt(escape)

The second issue is with the num parser. It doesn't parse all valid json numbers, and will crash under some circumstances due to the use of let assert. The incorrect assumption was that int.parse will work on anything that wasn't parsed by float.parse, when that isn't the case. JSON numbers have a whole number component, an optional decimal component, and an optional exponent. This exponent can be present even if the decimal component is not, making 7e4 a valid json number, but neither float.parse or int.parse will parse a number like this, as float.parse requires a decimal component and an optional exponent, while int.parse cannot accept a decimal component or exponent. Another issue is that json numbers can be arbitrarily large (or small for negatives), but float.parse will fail for numbers over/under the size of a maximum/minimum double, so handling for that is also needed (in my project I set the result to be the max/min double when it was exceeded). A fixed version of this parser that I used in my project was the following:

let num =
    maybe(just("-"))
    |> pair(
      alt(
        to(just("0"), ["0"]),
        recognize(pair(
          one_of(["1", "2", "3", "4", "5", "6", "7", "8", "9"]),
          many0(digit()),
        )),
      )
      |> map(string.concat),
    )
    |> pair(maybe(
      just(".")
      |> right(many1(digit()))
      |> map(string.concat),
    ))
    |> pair(
      recognize(maybe(
        alt(just("e"), just("E"))
        |> pair(maybe(one_of(["+", "-"])))
        |> pair(many1(digit())),
      ))
      |> map(string.concat),
    )
    |> map(fn(p) {
      case p {
        #(#(#(neg, ns), ds), ex) -> {
          {
            option.unwrap(neg, "") <> ns <> "." <> option.unwrap(ds, "0") <> ex
          }
          |> float.parse
          |> result.unwrap(case neg {
            Some(_) -> -1.7976931348623158e308
            None -> 1.7976931348623158e308
          })
          |> Num
        }
      }
    })

This version works by basically inserting a .0 as the decimal component of a number if there was none provided, such that float.parse will always work (with the exception of too large/small numbers which are given fallback values).
For a full fixed version feel free to check out the one I used in my gleam library.

Thanks for opening the issue!
I kind of forgot that my example JSON parser was not complete, I'll definitely fix it and have at look at your code.