thoughtbot/Argo

Parsing a json array with strings and objects intermixed

lingolab2 opened this issue · 2 comments

_Don't know how to parse below JSON as we have an array of heterogenous elements.
Some are just plain strings while others sometimes are JSON-objects themselves.

Sample JSON snippet

   "sense": {
    "pos": "&n;",
    "glo2s": [ **<--- ARRAY OF STRING and OBJECT**
     "value1",     **<--- a STRING value**
     {                  **<--- a JSON object**
       "g_type": "expl",
       "body": "savoury pancake containing meat or seafood and vegetables"
     }
    ]
   }
   ...
   ...
   ...
   "sense": {
    "p2s": [
     "&exp;",
     "&n;"
    ],
    "misc": "&id;",
    "glo2s": [    **<-- ARRAY OF STRINGS ONLY**
     "the harmonizing, mentally and physically, of two parties engaged in an activity",
     "singing from the same hymn-sheet",
     "dancing to the same beat"
    ]
   }

Models

struct Gloss
{
   let g_type: String
   let body: String

}

extension Gloss: Argo.Decodable {
   static func decode(_ json: JSON) -> Decoded<Gloss> {
      return curry(Gloss.init)
         <^> json <| "g_type"
         <*> json <| "body"
   }
}

struct Sense {
   let pos: String?
   let p2s: [String]?
   let xref: String?
   let xre2f: [String]?
   let gloss: Gloss?
   let gl1ss: String?
   let glo2s: [String]?
}

extension Sense: Argo.Decodable {
   static func decode(_ json: JSON) -> Decoded<Sense> {
      return curry(Sense.init)
         <^> json <|? "pos"
         <*> json <||? "p2s"
         <*> json <|? "xref"
         <*> json <||? "xre2f"
         <*> json <|? "gloss"
         <*> json <|? "gl1ss"
         <*> json <||? "glo2s"
   }
}

Argo Version

example: Argo 4.1.2

What a great question! If I'm understanding correctly, you have a heterogeneous array of String or Object and you want to parse them into a homogeneous array of String, right? Assuming so, with a touch of functional wizardry and a custom decoder, Argo can get you there:

given:

{
  "things": [
    "Thing1",
    {
      "type:": "obj",
      "name": "Thing2"
    }
  ],
}

You can parse this structure like so:

struct Container {
  let things: [String]
}

// This is our custom decoder that can handle moving from arbitrary JSON to String values
func thingDecoder(_ json: JSON) -> Decoded<String> {
  switch json {

  // Handle the case where we already have a String
  case let .string(str): return pure(str)

  // Handle the case where we have an object by parsing out the value we want
  case .object: return json <| "name"

  // If we've gotten this far, we don't have a value that we expect and can fail
  default: return .typeMismatch(expected: "String or Object", actual: json)
  }
}

extension Container: Argo.Decodable {
  static func decode(_ json: JSON) -> Decoded<Container> {
    // We need this intermediate variable to help the compiler out
    let jsons: Decoded<[JSON]> = json <|| "things"

    // Wizardry here!
    let things: Decoded<[String]> = jsons.flatMap { sequence($0.map(thingDecoder) }

    // We have everything we need and so can return it
    return curry(self.init)
      <^> things
  }
}

That wizardry is actually super interesting. The gist of the problem is that we need to do this transformation:

JSON -> Decoded<[JSON]> -> Decoded<[String]>

BUT, since we need a custom decoder, we have an intermediate step of JSON -> Decoded<String>. This means we need to do that wacky sequence($0.map(decoder)) dance in order to get the types to line up properly. $0.map(decoder) goes from [JSON] -> [Decoded<String>] and then sequence gets us from there back to Decoded<[String]>.

Another interesting thing: that sequence call has a specific shape to it:

((T) -> Decoded<U>) -> [T] -> Decoded<[U]>

This function actually has a name! In Haskell this function is named mapM (map Monadically) and so we can define that like so:

func mapM<T, U>(_ transform: (T) -> Decoded<U>, _ xs: [T]) -> Decoded<[U]> {
  return sequence(xs.map(transform))
}

Now, our wizardry line can be simplified a bit:

let things: Decoded<[String]> = jsons.flatMap { mapM(thingDecoder, $0) }

If we were feeling particularly clever today we could define mapM as a curried function and write it in a point-free style:

func mapM<T, U>(_ transform: @escaping (T) -> Decoded<U>) -> ([T]) -> Decoded<[U]> {
  return { sequence($0.map(transform)) }
}

// snip

let things: Decoded<[String]> = jsons.flatMap(mapM(thingDecoder))

Also interestingly, this is as far as you can simplify things with prior art as a guide. As far as I can tell, there's nothing with the shape of flatMap . mapM (the composition of flatMap and mapM), so you're in interesting territory here!

I'm going to go ahead and close this due to inactivity but please feel free to reopen if this is still an issue.