Is there a bug in version 0.13.0?

Question

Is there a bug in version 0.13.0?

solaim opened this issue 2 years ago · 1 comments

solaim commented 2 years ago

XML size is about >= 20M.

Use version 0.12.0: parse ok, data ok.
Use version 0.13.0: parse ok, but some data is dropped.

I know there are something, but I do not know what happened.

so I downgrade to version 0.12.0, everything is ok.

Answer 1 · 2023-07-07T16:28:39.000Z

It’s impossible to reliably reproduce an issue if you don’t provide a minimal example. Have you tried reducing the XML to see if it’s correctly parsed?

Edit: I just a created an XML file of 121MB and got no issue parsing it and unparsing it:

t=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
echo '<?xml version="1.0" encoding="utf-8"?>' > a.xml
echo '<a>' >> a.xml
for _ in {1..1000000}; do
  echo "<$t>$t</$t>" >> a.xml
done
echo -n '</a>' >> a.xml

with open("a.xml", "rb") as f:
    x = xmltodict.parse(f)

print(len(x["a"]["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"])) # 1000000

with open("b.xml", "wb") as f:
    xmltodict.unparse(x, output=f, pretty=True, indent="")

$ shasum -a 256 a.xml b.xml
cb7028e5d0bbb62b296e8b53d543eb53248208365c0e7de41090d6911e0aa9dd  a.xml
cb7028e5d0bbb62b296e8b53d543eb53248208365c0e7de41090d6911e0aa9dd  b.xml

Edit 2: no issue either with a 18GB file containing 120,000,000 elements (in streaming mode).