purescript-contrib/purescript-uri

Remove fromJust usage in URI.Extra.QueryPairs

Opened this issue · 2 comments

Is your change request related to a problem? Please describe.
We ran into a crashing bug in our production front-end where using print from URI.Extra.QueryPairs would result in an error which we traced down to fromJust against encodeURIComponent's result.

Examples:

Both keyToString and valueToString in URI.Extra.QueryPairs will error if their usage of encodeURIComponent results in a Nothing. This can happen if unprintable characters. For example a Dooli\U001001a9leinside of a string given toencodeURIComponentresults inNothing`.

I'm not sure exactly what in the string is causing encodeURIComponent to return a Nothing there.

Describe the solution you'd like
Broadly it'd be nice if usages of fromJust here were eliminated entirely. The Maybeness of the entire API could be represented in its external interface.

Alternatively the API could be split into safe and unsafe versions. This is a little tricky since there's already some "unsafe" functions here which are labeled and documented as such for different kinds of safety reasons.

I'm happy to work on a PR but would wanted to get some insight into the appetite for the various solutions to this.

Additional context
In the meantime we're going to mitigate this issue on our end locally by testing our input to the problematic functions against encodeURIComponent first - and possibly by additional input filtering elsewhere in our app.

I think there is appetite for this -- this library is meant to be safe. There are a few different things we can do, some of which are in this repository and some of which are upstream in js-uri, which provides the encode / decode functions.

Upstream, we can start by adding failing tests that demonstrate cases where encodeURIComponent and decodeURIComponent return Nothing:

https://github.com/purescript-contrib/purescript-js-uri/blob/b809ef6af53b880e0682bba8a9291f855607d2f4/test/Main.purs#L10-L18

For example, we could pull tests from the examples on MDN of when a URIError gets thrown. It's easier to think of strings that will fail decoding rather than encoding, but it's possible for example when encoding an unmatched surrogate pair, or in your example of an unprintable character.

Second, it may be possible to write some sanitizing code so that you can safely use encodeURIComponent and decodeURIComponent. We would need property tests via quickcheck to help prove that we're effectively sanitizing the input string before encoding. These safe versions could be written and used in this library, or they could possibly be added upstream to js-uri.

Finally, if that's not possible, then I'm happy to consider ways to adjust this package so that it never crashes in the case where encoding or decoding results in Nothing.

Here is the error I think we hit:

module Main where                                                                   
                                                                                      
import URI.Extra.QueryPairs (valueFromString)                                       
x = valueFromString "\xdc00"

Which crashes with

Error: Failed pattern match at Data.Maybe (line 281, column 1 - line 281, column 46): Nothing
    at /tmp/psctest/output/Data.Maybe/index.js:151:15
    at /tmp/psctest/output/URI.Common/index.js:60:24
    at /tmp/psctest/output/Data.Either/index.js:85:34
    at /tmp/psctest/output/Control.Monad.State.Trans/index.js:72:53
    at /tmp/psctest/output/Data.Identity/index.js:46:20
    at /tmp/psctest/output/Control.Monad.State.Trans/index.js:73:23
    at /tmp/psctest/output/Text.Parsing.Parser/index.js:216:137
    at /tmp/psctest/output/Data.Identity/index.js:112:20
    at /tmp/psctest/output/Text.Parsing.Parser/index.js:214:210
    at /tmp/psctest/output/Control.Monad.State.Trans/index.js:73:24