martinblech/xmltodict

The positional information of the XML tag as part of a text is not maintained in the converted JSON.

agarwal-nitesh opened this issue · 5 comments

Example:

<text>The author of the book <author index="1" year="2022">[XYZ</author>], is a well-known journalist.</text>

output is something like:

"text": [
  "author": {
     "@index": "1",
     "@year": "2022",
     "#text": "[XYZ"
   }
  "#text": "The author of the book ], is a well-known journalist. "
  ]

The position of the author tag in the above example cannot be deterministically derived in the converted JSON

It may be converted to this json

{
  "text": {
    "#text": "The author of the book ",
    "author": {
      "-index": "1",
      "-year": "2022",
      "#text": "[XYZ"
    },
    "#text1": "], is a well-known journalist."
  },
  "#omit-xml-declaration": "yes"
}

How do you convert it to that JSON

It may be converted to this json

{
  "text": {
    "#text": "The author of the book ",
    "author": {
      "-index": "1",
      "-year": "2022",
      "#text": "[XYZ"
    },
    "#text1": "], is a well-known journalist."
  },
  "#omit-xml-declaration": "yes"
}

How does one convert it to that JSON? I'm getting

>>> json.dumps(xmltodict.parse("""<text>The author of the book <author index="1" year="2022">[XYZ</author>], is a well-known journalist.</text>"""))
'{"text": {"author": {"@index": "1", "@year": "2022", "#text": "[XYZ"}, "#text": "The author of the book ], is a well-known journalist."}}'
>>> print(xmltodict.__version__)
0.13.0

How do you convert it to that JSON

It may be converted to this json

{
  "text": {
    "#text": "The author of the book ",
    "author": {
      "-index": "1",
      "-year": "2022",
      "#text": "[XYZ"
    },
    "#text1": "], is a well-known journalist."
  },
  "#omit-xml-declaration": "yes"
}

How does one convert it to that JSON? I'm getting

>>> json.dumps(xmltodict.parse("""<text>The author of the book <author index="1" year="2022">[XYZ</author>], is a well-known journalist.</text>"""))
'{"text": {"author": {"@index": "1", "@year": "2022", "#text": "[XYZ"}, "#text": "The author of the book ], is a well-known journalist."}}'
>>> print(xmltodict.__version__)
0.13.0

It was a suggestion how it may be converted.

How does one make xmltodict convert the XML to that JSON. Or, what tool to you suggest to use to make your suggested JSON from XML?

How does one make xmltodict convert the XML to that JSON. Or, what tool to you suggest to use to make your suggested JSON from XML?

I used online service https://xmltojson.github.io