thoth-org/Thoth.Json

[Question] How to decode fields which names we don't know?

Closed this issue · 6 comments

I have to decode a json like this one:

{
  "ids": []
  "authors": []
  "kinds": []
  "#e": []
  "#p": []
  "since": 0
  "until": 0
  "limit": 0
}

All the fields are optional and I decode it as follow:

let filter : Decoder<Filter> =
    Decode.object (fun get -> {
        Ids = get.Optional.Field "ids" (Decode.list Decode.string) |> Option.defaultValue [] 
        Kinds = get.Optional.Field "kinds" (Decode.list Decode.Enum.int) |> Option.defaultValue []
        Authors = get.Optional.Field "authors" (Decode.list Decode.string) |> Option.defaultValue []
        Limit = get.Optional.Field "limit" Decode.int
        Since = get.Optional.Field "since" Decode.unixDateTime
        Until = get.Optional.Field "until" Decode.unixDateTime
        Events = get.Optional.Field "#e" (Decode.list Decode.string) |> Option.defaultValue []
        PubKeys = get.Optional.Field "#p" (Decode.list Decode.string) |> Option.defaultValue []
    })

This works perfectly well. However, those #e and #p are just two instances of something known as "tag" (a field which name starts with # ) and there can be any number of them. I mean, there can be #r, #g, #whatever. In my code you can see that I only handle the two most common #e and #p and use the fields Events and PubKeys.

Instead of Events and PubKeys I need to have only one field called Tags ( a (string * string list) list) ) containing all the tags and their values.

For example:

{ "#e": ["hello", "wold"], "#p" : ["aabbbcc...."], "#g": ["london", "madrid"] }

should be decoded as:

[
 "#e", ["hello"; "wold"]
 "#p", ["aabbbcc...."]
 "#g", ["london"; "madrid"] 
]

I think what I have to do is extract the tags in a decode continuation as below but I can find a way to decode fields of unknown name.

let filter : Decoder<Filter> =
    Decode.object (fun get -> {
        Ids = get.Optional.Field "ids" (Decode.list Decode.string) |> Option.defaultValue [] 
        Kinds = get.Optional.Field "kinds" (Decode.list Decode.Enum.int) |> Option.defaultValue []
        Authors = get.Optional.Field "authors" (Decode.list Decode.string) |> Option.defaultValue []
        Limit = get.Optional.Field "limit" Decode.int
        Since = get.Optional.Field "since" Decode.unixDateTime
        Until = get.Optional.Field "until" Decode.unixDateTime
        Tags = []
    })
    |> Decode.andThen(fun filter path value ->
         // decode tags here
         Decode.succeed {filter with Tags = tags}
    }

Any hint?

Hello @lontivero,

I feel like your situation is similar to the one described in the "Unkown fields" advanced example.

Can you check the documentation I linked help you?

njlr commented

This is quite tricky to get right because Decode.keyValuePairs will attempt to parse all values in the object. We want string list, but some of the values have other types (e.g. int). We need a way to take only keys that start with #.

I think there could be an argument for a new function Decode.keyValuePairsFiltered that would first filter the keys and then apply the decoder.

Usage like so:

module Decode =

  let keyValuePairsFiltered (keyFilter : string -> bool) (decoder : Decoder<'a>) : Decoder<(string * 'a) list> =
    failwith "TODO"

let tagsDecoder : Decoder<(string * string list) list> =
  Decode.keyValuePairsFiltered
    (fun k -> k.StartsWith "#")
    (Decode.list Decode.string)

let knownDecoder : Decoder<Known> =
  Decode.object
    (fun get ->
      {
        Ids = get.Optional.Field "ids" (Decode.list Decode.string) |> Option.defaultValue []
        // etc...
        Tags = [] // Filled in later
      })

let decoder =
  Decode.map2
    (fun known tags ->
      {
        known with
          Tags = tags
      })
    knownDecoder
    tagsDecoder

You could build Decode.keyValuePairsFiltered by combining Decode.keyValuePairs and Decode.oneOf, but it would be more efficient if implemented in the library.

@MangelMaxime yes, that's exactly what i needed. I've just made it work with a much less elegant solution following the @njlr idea (thanks) and reimplemented the keyValuePairs by filtering the keys starting with # and hardcoded the received decoder.

I can only work in this pet project a very few hours during weekends and I will try to find a better solution then, but I wanted to give you feedback and say thank you.

let filter : Decoder<Filter> =
    let knownDecoder = Decode.object (fun get -> {
        Ids = get.Optional.Field "ids" (Decode.list Decode.string) |> Option.defaultValue [] 
        // etc...
        Tags = [] // Filled in later
    })
    
    let tagsDecoder : Decoder<(string * string list) list> =
        fun path value ->
            match Decode.keys path value with
            | Ok objecKeys ->
                let tagKeys = objecKeys |> Seq.filter (fun t -> t.StartsWith "#")  // filter keys
                (Ok [], tagKeys ) ||> Seq.fold (fun acc prop ->
                    match acc with
                    | Error _ -> acc
                    | Ok acc ->
                        match Decode.Helpers.getField prop value |> (Decode.list Decode.string) path with  // hardcoded decoder
                        | Error er -> Error er
                        | Ok value -> (prop, value)::acc |> Ok)
                |> Result.map List.rev
            | Error e -> Error e
              
    Decode.map2 (fun known tags -> { known with Tags = tags })
        knownDecoder
        tagsDecoder   
njlr commented

May help future visitors:

module Decode =

  let filteredKeyValuePairs (keyFilter : string -> bool) (decoder : Decoder<'a>) : Decoder<(string * 'a) list> =
    fun path value ->
      match Decode.keys path value with
      | Ok objectKeys ->
        (Ok [], objectKeys) ||> List.fold (fun acc prop ->
          if keyFilter prop then
            match acc with
            | Error _ -> acc
            | Ok acc ->
              match (Decode.field prop decoder) path value with
              | Error er -> Error er
              | Ok value -> (prop, value)::acc |> Ok
          else
            acc)
        |> Result.map List.rev
      | Error e -> Error e

Thank you @njlr

It is always difficult to know what should or should not be part of Thoth.Json core.

But the fact that users can write custom decoders for their specific is a good middle ground for covering specific situations like this one.

I faced this problem again. This time I decided to use a different approach. I let it here for future visitors:

let profile : Decoder<Profile> =
    let commonFieldNames = ["name"; "about"; "picture"; "banner"]
    let commonFieldsDecoder = Decode.object (fun get ->
        { Name = get.Required.Field "name" Decode.string
          About = get.Required.Field "about" Decode.string
          Picture = get.Required.Field "picture" Decode.string
          Banner = get.Optional.Field "banner" Decode.string
          Additional = [] })

    let additionalFieldsDecoder : Decoder<(string * string) list> =
        Decode.keyValuePairs Decode.anyAsString
        |> Decode.map (List.filter (fun (name, _) -> not (List.contains name commonFieldNames)))

    Decode.map2 (fun common additional -> { common with Additional =  additional })
        commonFieldsDecoder
        additionalFieldsDecoder

This solution is good enough for the scenario that I handling but could be horrible for others like the one I presented in the description of this issue.

In summary, this is something pretty common and the solution depends on the specific cases. Thoth is flexible enough to support all these.