Parsing a json array with strings and objects intermixed

Question

Parsing a json array with strings and objects intermixed

lingolab2 opened this issue 6 years ago · 2 comments

_Don't know how to parse below JSON as we have an array of heterogenous elements.
Some are just plain strings while others sometimes are JSON-objects themselves.

Sample JSON snippet

   "sense": {
    "pos": "&n;",
    "glo2s": [ **<--- ARRAY OF STRING and OBJECT**
     "value1",     **<--- a STRING value**
     {                  **<--- a JSON object**
       "g_type": "expl",
       "body": "savoury pancake containing meat or seafood and vegetables"
     }
    ]
   }
   ...
   ...
   ...
   "sense": {
    "p2s": [
     "&exp;",
     "&n;"
    ],
    "misc": "&id;",
    "glo2s": [    **<-- ARRAY OF STRINGS ONLY**
     "the harmonizing, mentally and physically, of two parties engaged in an activity",
     "singing from the same hymn-sheet",
     "dancing to the same beat"
    ]
   }

Models

struct Gloss
{
   let g_type: String
   let body: String

}

extension Gloss: Argo.Decodable {
   static func decode(_ json: JSON) -> Decoded<Gloss> {
      return curry(Gloss.init)
         <^> json <| "g_type"
         <*> json <| "body"
   }
}

struct Sense {
   let pos: String?
   let p2s: [String]?
   let xref: String?
   let xre2f: [String]?
   let gloss: Gloss?
   let gl1ss: String?
   let glo2s: [String]?
}

extension Sense: Argo.Decodable {
   static func decode(_ json: JSON) -> Decoded<Sense> {
      return curry(Sense.init)
         <^> json <|? "pos"
         <*> json <||? "p2s"
         <*> json <|? "xref"
         <*> json <||? "xre2f"
         <*> json <|? "gloss"
         <*> json <|? "gl1ss"
         <*> json <||? "glo2s"
   }
}

Argo Version

example: Argo 4.1.2

Answer 1 · 2018-12-04T14:40:40.000Z

What a great question! If I'm understanding correctly, you have a heterogeneous array of String or Object and you want to parse them into a homogeneous array of String, right? Assuming so, with a touch of functional wizardry and a custom decoder, Argo can get you there:

given:

{
  "things": [
    "Thing1",
    {
      "type:": "obj",
      "name": "Thing2"
    }
  ],
}

You can parse this structure like so:

struct Container {
  let things: [String]
}

// This is our custom decoder that can handle moving from arbitrary JSON to String values
func thingDecoder(_ json: JSON) -> Decoded<String> {
  switch json {

  // Handle the case where we already have a String
  case let .string(str): return pure(str)

  // Handle the case where we have an object by parsing out the value we want
  case .object: return json <| "name"

  // If we've gotten this far, we don't have a value that we expect and can fail
  default: return .typeMismatch(expected: "String or Object", actual: json)
  }
}

extension Container: Argo.Decodable {
  static func decode(_ json: JSON) -> Decoded<Container> {
    // We need this intermediate variable to help the compiler out
    let jsons: Decoded<[JSON]> = json <|| "things"

    // Wizardry here!
    let things: Decoded<[String]> = jsons.flatMap { sequence($0.map(thingDecoder) }

    // We have everything we need and so can return it
    return curry(self.init)
      <^> things
  }
}

That wizardry is actually super interesting. The gist of the problem is that we need to do this transformation:

JSON -> Decoded<[JSON]> -> Decoded<[String]>

BUT, since we need a custom decoder, we have an intermediate step of JSON -> Decoded<String>. This means we need to do that wacky sequence($0.map(decoder)) dance in order to get the types to line up properly. $0.map(decoder) goes from [JSON] -> [Decoded<String>] and then sequence gets us from there back to Decoded<[String]>.

Another interesting thing: that sequence call has a specific shape to it:

((T) -> Decoded<U>) -> [T] -> Decoded<[U]>

This function actually has a name! In Haskell this function is named mapM (map Monadically) and so we can define that like so:

func mapM<T, U>(_ transform: (T) -> Decoded<U>, _ xs: [T]) -> Decoded<[U]> {
  return sequence(xs.map(transform))
}

Now, our wizardry line can be simplified a bit:

let things: Decoded<[String]> = jsons.flatMap { mapM(thingDecoder, $0) }

If we were feeling particularly clever today we could define mapM as a curried function and write it in a point-free style:

func mapM<T, U>(_ transform: @escaping (T) -> Decoded<U>) -> ([T]) -> Decoded<[U]> {
  return { sequence($0.map(transform)) }
}

// snip

let things: Decoded<[String]> = jsons.flatMap(mapM(thingDecoder))

Also interestingly, this is as far as you can simplify things with prior art as a guide. As far as I can tell, there's nothing with the shape of flatMap . mapM (the composition of flatMap and mapM), so you're in interesting territory here!

Answer 2 · 2020-05-22T01:26:50.000Z

I'm going to go ahead and close this due to inactivity but please feel free to reopen if this is still an issue.