thoth-org/Thoth.Json

Better guide and support for inconsistent structures

Closed this issue · 14 comments

Disclaimer

First of all I must say: this library is great!

Background

Few days ago I had to write 7 decoders for 7 different currency rate providers and there were some troubles:

  • the documentation for decoders covers only very simple cases, one needs to dig through the source code to find out all the power the Thoth.Json.Net offers
  • some common patterns are not supported out of the box, had to implement my own.

Solution

Extending the documentation page and providing few more decoder operations (see DecodeX at the end) all the fancy, strange and inconsistent JSON decoders are quite simple and very concise.

The question is:

would you consider extending the base Decode module a little? I could prepare pull request. If not, then maybe I could just add a little bit to the documentation, showing maybe examples and solutions? That could help/attract new users.

Let me show you few examples

Common theme to notice is: collect only available currency rates. Do not trouble with missing ones.

Currency Rate Data Provider A

Response can look like a list:

[ { "code": "GBPUSD", "rate": 1.3159, "timestamp": 1597732500 },
  { "code": "GBPZAR", "rate": "NA", "timestamp": 1597732500 },
  { "code": "USDHRK", "rate": 6.3305, "timestamp": 1597732500 } ]

or it can be a single valid rate:

{ "code": "GBPUSD", "rate": 1.3159, "timestamp": 1597732500 }

or it can be a single missing rate:

{ "code": "GBPZAR", "rate": "NA", "timestamp": 1597732500 }

Decoder:

    let ratesDecoder: Decoder<RateObject list> =
        let rateDecoder: Decoder<RateObject option> =
            Decode.map3
                (fun (code: string) close timestamp ->
                    { Pair = pairOfString code
                      Rate = Rate close
                      ValidFrom = Instant.FromUnixTimeSeconds timestamp })
                (Decode.field "code" Decode.string)
                (Decode.field "close" Decode.decimal)
                (Decode.field "timestamp" Decode.int64)
            |> DecodeX.noFail

        // received only one rate (not a list)
        let caseA = rateDecoder |> Decode.map Option.toList

        // received list of rates
        let caseB = rateDecoder |> DecodeX.listFromOptions

        Decode.oneOf [ caseA; caseB ]

Currency Rate Data Provider B

{
    "ts": "2021-08-13T09:54:25Z",
    "EUR_PLN": { "bid": "4.5700", "ask": "4.5709" },
    "GBP_PLN": { "error": "some description" },
    "USD_PLN": { "bid": "3.8934", "ask": "3.8943" }
}

Decoder:

    let ratesDecoder: Decoder<RateObject list> =
        let bidAskDecoder: Decoder<decimal option> =
            Decode.map2
                (fun bid ask -> (bid + ask) / 2M)
                (Decode.field "bid" Decode.decimal)
                (Decode.field "ask" Decode.decimal)
            |> DecodeX.noFail

        let tsDecoder: Decoder<Instant> =
            Decode.string
            |> DecodeX.nodaParse InstantPattern.ExtendedIso.Parse

        let rateObject ts (key, value) =
            { Pair = pairOfString key
              Rate = Rate value
              ValidFrom = ts }

        Decode.map2
            (rateObject >> List.map)
            (Decode.field "ts" tsDecoder)
            (DecodeX.keyValueOptions bidAskDecoder)

Currency Rate Data Provider C

Can you guess the JSON by just reading the decoder below? Hint: start from bottom.

Decoder:

    let ratesDecoder: Decoder<RateObject list> =
        let tsPattern =
            LocalDateTimePattern.CreateWithInvariantCulture("dd-MM-yyyy HH:mm:ss.FFF")

        let tz =
            DateTimeZoneProviders.Tzdb.["America/New_York"]

        let tsDecoder: Decoder<Instant> =
            Decode.map2
                (fun date time -> date + " " + time)
                (Decode.field "D791" Decode.string)
                (Decode.field "D768" Decode.string)
            |> DecodeX.nodaParse tsPattern.Parse
            |> Decode.map (fun it -> (it.InZoneLeniently tz).ToInstant())

        Decode.map5
            (fun c1 c2 bid ask ts ->
                { Pair = Currency c1, Currency c2
                  Rate = Rate(bid + ask / 2M)
                  ValidFrom = ts })
            (Decode.field "S4577" Decode.string)
            (Decode.field "S4578" Decode.string)
            (Decode.field "D4" Decode.decimal)
            (Decode.field "D6" Decode.decimal)
            tsDecoder
        |> Decode.list

Currency Rate Data Provider C

Again, can you guess the JSON by just reading the decoder below?

Decoder:

    let ratesDecoder: Decoder<RateObject list> =
        let rateObject ts (pair, rate) =
            { Pair = pair
              Rate = rate
              ValidFrom = Instant.FromUnixTimeSeconds ts }

        let quotesDecoder: Decoder<(Pair * Rate) list> =
            Decode.map3
                (fun c1 c2 mid -> (Currency c1, Currency c2), Rate mid)
                (Decode.field "base_currency" Decode.string)
                (Decode.field "quote_currency" Decode.string)
                (Decode.field "mid" Decode.decimal)
            |> DecodeX.noFail
            |> DecodeX.listFromOptions

        Decode.map2
            (rateObject >> List.map)
            (Decode.field "timestamp" Decode.int64)
            (Decode.field "quotes" quotesDecoder)

DecodeX

[<RequireQualifiedAccessAttribute>]
module DecodeX =

    /// successful decoding yields successful Ok,
    /// unsuccessful decoding yields successful None
    let inline noFail (decoder: Decoder<'a>) : Decoder<'a option> =
        fun path token ->
            match decoder path token with
            | Ok x -> Ok(Some x)
            | Error _ -> Ok None

    /// collects only successful items in lists
    let inline collectOptions (decoder: Decoder<'a option list>) : Decoder<'a list> =
        decoder |> Decode.map (List.choose id)

    let inline listFromOptions (decoder: Decoder<'a option>) : Decoder<'a list> =
        decoder |> Decode.list |> collectOptions

    let inline keyValueOptions (decoder: Decoder<'a option>) : Decoder<(string * 'a) list> =
        decoder
        |> Decode.keyValuePairs
        |> Decode.map (
            List.collect
                (fun (key, aOpt) ->
                    match aOpt with
                    | Some a -> [ key, a ]
                    | None -> [])
        )

    let inline flattenResult (decoder: Decoder<Result<'a, string>>) : Decoder<'a> =
        decoder
        |> Decode.andThen
            (function
            | Ok value -> Decode.succeed value
            | Error str -> Decode.fail str)

    // this one is specific to NodaTime, could not get into the Thoth.Json.Net for sure
    let inline nodaParse (nodaTimeParser: string -> ParseResult<'a>) (decoder: Decoder<string>) : Decoder<'a> =
        decoder
        |> Decode.map (nodaTimeParser >> NodaX.parserResultToResult)
        |> flattenResult

Hello @witoldsz,

thank you for taking the time of writing this issue.

It has been a long time that I post-poned the rework of Thoth.Json documentation.

I am happy to announce that I am now working on it both because you pinged me and because Nacara v1 has been reach allowing me to have a good base for building documentation.

About extending the core library I am not against it, I just need to think over the different proposition, both in term of names and if they are general purpose enough.

So what I propose is:

  1. Work on the documentation rework
  2. Review your different proposition to see if we can add them or need to improve the naming or something etc.

The work on the documentation has already started when ready, I can ping you to review it.

Like that you can provide some feedbacks and make addition where you think things are missing.


On another subject, you should avoid making your general purpose inline because they will increase the bundle size each time you use them.

OK, so you will cover the case while working on the new documentation, is that right?

OK, so you will cover the case while working on the new documentation, is that right?

I would say, but I am not sure which case you are speaking about.

If you mean:

  • the fact that number are encoded as string, then yes
  • Add exemple for more complexe encoder/decoder like DUs and composition using andThen, map, etc. then yes too

If you are more case that you think should be cover please mention them, so we can discuss it and include them if needed.

OK, will just wait to see what did you come up with.

From my perspective, it would be great if new docs were covering the cases like I have described: { ts: date, EUR_USD: {rate}, EUR_CHF: {rate}, …} (so you do not know what the keys are) or when you want to define successful case decoder and then collect over the list or map, you know… what I have described as the A, B and C data providers.

The documentation rework has been completed, please have a look and see if this cover the subjects you had in mind.

The new documentation is nice and all, but it does not cover the cases I have described above:

  • how to decode object with unknown fields, e.g.
    { "EURUSD": 1.134323, "USDCZK": 21.6945, ...possibly many others }
  • how to decode object with some fields known and some unknown (and I do not mean optional), e.g.
    { "time": ..., "EURUSD": ..., and possibly many others }
  • how to decode object with known or unknown fields but with different shape of values, e.g.
    { "EURUSD": 1.134323, "USDCZK": "n/a", etc… }
  • how to decode some bizarre situations like with one provider which can either return data like this:
    { "base_currency": "EUR", "quote_currency": "USD", "rate": 1.134323 }
    or if has more than one result, then inside an array:
    [ { "base_currency": "EUR", "quote_currency": "USD", "rate": 1.134323 }, … ]
    and you never know which shape you get.

To sum it up, basically all the cases I had when I was implementing decoders for several currency rates data providers, described in this ticket.

I was able to successfully write all the decoders because I have some background with FP and with Elm decoders and I was studying the source code of Thoth.Json.Net to figure it all, but for someone with less background it will end up with using some other library/language with ugly imperative code to decode it (my project was to replace old JS code with F#).

@witoldsz Thank you for taking the time to look at the documentation.

I kind of knew that it would not cover all the things you listed in this issues but I had a hard time processing it. (I am at my 10th re-read now ahah 😅). Because, the case you are describing are not simple.

From what I understand now:

  1. Add documentation about Decode.oneOf because indeed I forgot about this helper which each is common.

    This will help cover case where you can have different shape for your type. For example:

    {
        "$type": "int",
        "value": 42
    }
    
    42

    I am using simpler JSON that you, but the idea should be the same.

  2. Add documentation about case where you have an object with some fields that you don't know the name but needs to retrieve. In which case, you used Decode.keyValuePairs to decode it and retrieve a list of field.

  3. Review the list of decoders from DecodeX and decide if they should go into Thoth.Json or a dedicated library or just placed in a Guide

Am I right?

Well, it is not just the 3 points you've mentioned, but also the clever ways to combine them to get an outstanding result. It's like you have a LEGO blocks for the first time or you see the "manual" of the chess game – it does not mean you know how to build something you need or how to play the game. One needs some hints and maybe some extra tools (like the one I had to build to cover all the data providers with ease).

For example, the Decode.keyValuePairs itself is cool, but see what DecodeX.keyValueOptions can do with it if you know how to connect few dots on the decoders diagram.

P.S.
I have just realized the new documentation does not include the reference manual, where one could see all the available operators with short description and (hopefully) links to Thoth.Json and Thoth.Json.Net code. That would be awesome!

but also the clever ways to combine them to get an outstanding result.

I get your point, originally I had a "Guide" section in the new documentation but removed it. I think, I will re-add it and use it as an example of how to use Thoth.Json for complex types. I "just" need to find a good idea of clear and concise example to use.

I have just realized the new documentation does not include the reference manual, where one could see all the available operators with short description and (hopefully) links to Thoth.Json and Thoth.Json.Net code. That would be awesome!

Yes, it doesn't have it yet. The reason is I think some of the F# used in Thoth.Json is not well rendered by Nacara yet. The current version included in Nacara, is really a POC that I put to test the concept and to have for Nacara. As it simplified the documentation a lot since I was able to put the documentation in the code directly :)

I am in the process of rewriting it from scratch to have it robust and tested so I can fix bug without having regression in it. It should come in the coming weeks.

@witoldsz

I added documentation about Decode.oneOf to the released documentation.

I also worked on an advanced section which try to work with one of the JSON you provided and goes step by step over the code.

I kind of had to invent the example, but I wasn't sure how your code was but I think I have a lot of things covered by this example. Could you please have a look at it?

It is in this PR: #126

And to have a better experience you should probably run the documentation locally:

  1. Pull the branch
  2. npm i
  3. ./build.js docs --watch

I am looking at the advanced/unknown-fields section of #126 and for me it seems more complicated than it actually is.
I am sure that with two or three more functions in Decode (especially keyValueOptions and noFail, maybe with better names?), the whole solution would be not that advanced and could just be an addition to the "inconsistent structure" chapter.

The more I work on the documentation/example and the more I am stuck...

Right now, I feel like to fix this issue, the discussion should be: Should the proposed decoders be added to the core library?

And if yes, add a doc comment explaining their usage so the documentation is more focused on a single problem at the time ==> The one that the decoder solves.

Using doc comment, will also allow to have the documentation available from the IDEs and API reference generated by Nacara. Example of API reference

Example of doc comment:

    /// <summary>
    /// Map the result of the given decoder into an `Option` type.
    /// </summary>
    ///
    /// <example>
    ///
    /// Successful decoder:
    ///
    /// <code lang="fsharp">
    /// let json = "42"
    ///
    /// Decode.fromString (Decode.ignoreFail Decode.int) json
    ///
    /// // Returns: Some 42
    /// </code>
    ///
    /// Failed decoder:
    /// <code lang="fsharp">
    /// let json = "\"abc\""
    ///
    /// Decode.fromString (Decode.ignoreFail Decode.int) json
    ///
    /// // Returns: None
    /// </code>
    /// </example>
    /// <param name="decoder">Decoder to transform</param>
    /// <returns>
    /// Returns `Some 'T` if the decoder succeeds, `None` otherwise.
    /// </returns>
    let ignoreFail (decoder : Decoder<'T>) : Decoder<'T option> =
        fun path token ->
            match decoder path token with
            | Ok x -> Ok(Some x)
            | Error _ -> Ok None
thoth_json_demo_tooltip.mp4

I am not against expanding the core library but here are my current thoughts:


I think one problem I have with the proposed decoder is that it feels like they are a mix of decoders and logic. But there are already decoder likes that:

  • Decode.all
  • Decode.oneOf
  • Decode.keyValuePairs

So, I think this is just me not having used them enough to get comfortable with them.


Now, the other problem I have is the naming. For me the names don't make it explicit enough what they do.

Decode.noFail

Opinion

let inline noFail (decoder: Decoder<'a>) : Decoder<'a option> =
    fun path token ->
        match decoder path token with
        | Ok x -> Ok(Some x)
        | Error _ -> Ok None

Proposition

  • Decode.mapToOption because it map the decoder into an Option
  • Decode.toOption because it transform the decoder into an Option

Decode.collectOptions

Opinion

/// collects only successful items in lists
let inline collectOptions (decoder: Decoder<'a option list>) : Decoder<'a list> =
    decoder |> Decode.map (List.choose id)

This decoder is specifics for the list but the name doesn't show that. Also, shouldn't we offer the same decoder for array and seq ?

Proposition

Should we offer a more high-level decoder not restricted to the 'a option list?

Decode.List.choose <-- Introduce, a new concept of sub modules which provides specialised decoders for a specific type. Kind of what the user does when creating manual decoders for it own types.

Decode.listChoose

let listChoose 
    (decoder : Decoder<'A list>) 
    (chooser : ('A -> option<'B>)) : Decoder<'B list> =

    decoder
    |> Decode.map (List.choose chooser)

Decode.listFromOptions

Opinion

    let inline listFromOptions (decoder: Decoder<'a option>) : Decoder<'a list> =
        decoder |> Decode.list |> collectOptions

Should we have a version for array, seq to ?

Does this decoder add value compared to the user manually writing the chain ?

Decode.keyValueOptions

Opinion

let inline keyValueOptions (decoder: Decoder<'a option>) : Decoder<(string * 'a) list> =
    decoder
    |> Decode.keyValuePairs
    |> Decode.map (
        List.collect
            (fun (key, aOpt) ->
                match aOpt with
                | Some a -> [ key, a ]
                | None -> [])
    )

The name is not really explicit and doesn't match the existing decoder: Decode.keyValuePairs

Proposition

Decode.keyValuePairsFromOption <-- I don't really like that name because it feels like it is building from an object while in reality it is working on an object with an optional/option decoder.

Decode.optionalKeyValuePairs <-- Feels "ok", but it is not consistent with the fact that other optionalXXX decoder returns 'A option and not 'A

Decode.KeyValuePairs.choose <-- Follow the Decode.List.choose proposition which created sub-modules to provide specialised decoders to a specific type.

Decode.flattenResult

Opinion

let inline flattenResult (decoder: Decoder<Result<'a, string>>) : Decoder<'a> =
    decoder
    |> Decode.andThen
        (function
        | Ok value -> Decode.succeed value
        | Error str -> Decode.fail str)

Seems good to me

Your proposition and all thoughts provided above are great. I am looking forward to read it all through once I find some extra time. I would argue against a rush :)

The next release will update the documentation to include the section we worked on together in the past (sorry for the delay).