Is there a bug in version 0.13.0?
solaim opened this issue · 1 comments
solaim commented
XML size is about >= 20M.
Use version 0.12.0: parse ok, data ok.
Use version 0.13.0: parse ok, but some data is dropped.
I know there are something, but I do not know what happened.
so I downgrade to version 0.12.0, everything is ok.
bfontaine commented
It’s impossible to reliably reproduce an issue if you don’t provide a minimal example. Have you tried reducing the XML to see if it’s correctly parsed?
Edit: I just a created an XML file of 121MB and got no issue parsing it and unparsing it:
t=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
echo '<?xml version="1.0" encoding="utf-8"?>' > a.xml
echo '<a>' >> a.xml
for _ in {1..1000000}; do
echo "<$t>$t</$t>" >> a.xml
done
echo -n '</a>' >> a.xml
with open("a.xml", "rb") as f:
x = xmltodict.parse(f)
print(len(x["a"]["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"])) # 1000000
with open("b.xml", "wb") as f:
xmltodict.unparse(x, output=f, pretty=True, indent="")
$ shasum -a 256 a.xml b.xml
cb7028e5d0bbb62b296e8b53d543eb53248208365c0e7de41090d6911e0aa9dd a.xml
cb7028e5d0bbb62b296e8b53d543eb53248208365c0e7de41090d6911e0aa9dd b.xml
Edit 2: no issue either with a 18GB file containing 120,000,000 elements (in streaming mode).