sandrods/odf-report

Word found unreadable content in "report.odt" (and workaround)

Closed this issue · 8 comments

  1. Given a template.odt that MS Word opens without error,
  2. Use odf-report 0.7.0 to generate report.odt
  3. Opening report.odt in MS Word (Office 365, v. 16.36) produces the following error:

Word found unreadable content in "report.odt". Do you want to recover the contents of this document?

Workaround

Interestingly, after unzipping and re-zipping the report, MS Word does not complain!

unzip report.odt -d odt_contents
cd odt_contents
zip -r ../report_rezipped.odt .
# `report_rezipped.odt` can be opened in MS Word without complaint.
zip -h
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008).

Maybe rubyzip is doing something that MS Word doesn't like? and re-zipping with "official" zip fixes it?

Now that I have report.odt and report_rezipped.odt, and I can compare them to eachother, what should I be looking for? Would you like to see an unzip -l on each?

bundle | egrep -e '(rubyzip|odf-report|mime-types|nokogiri)'
Using nokogiri 1.10.9
Using mime-types-data 3.2019.1009
Using mime-types 3.3.1
Using rubyzip 2.3.0
Using odf-report 0.7.0

I think this is a problem similar to #104. I haven't had the time to look into it, but this week will be OSS week, :-) I'll have a look

Hi Sandro.

I think this is a problem similar to #104.

My gut says, similar maybe, but not exactly the same. #104 describes a problem in META-INF/manifest.xml. However, when I compare the contents of my report.odt (which Word dislikes) with my report_rezipped.odt (which Word likes) there is no difference in manifest.xml.

unzip -vl report.odt 
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
...
     962  Defl:N      266  72% 04-27-2020 20:30 7f41a5b1  META-INF/manifest.xml
...
unzip -vl report_rezipped.odt 
...
     962  Defl:N      266  72% 04-27-2020 20:30 7f41a5b1  META-INF/manifest.xml
...

Note the identical CRC-32 checksum.

I suspect we're using rubyzip incorrectly, but don't have more details yet.

I haven't had the time to look into it, but this week will be OSS week, :-) I'll have a look

Thanks, and please let me know if I can help in any way.

Well, I couldn't find any obvious culprits. I'm doing the same rubyzip handling I always did, except editing the MANIFEST.XML for the repeated images.

I still think #104 is related. Altough the OP find a diference in MANIFEST.XML, he removed the "\n" at the end and had to rezip the file. I suspect the reziping solved the problem, not the removal.

I'm gonna try to find a windows machine to run some tests (i'm a mac user).

Could #67 be related also?

Altough the OP find a diference in MANIFEST.XML, he removed the "\n" at the end and had to rezip the file. I suspect the reziping solved the problem, not the removal.

Yeah, it's possible the newline is a red herring.

I'm gonna try to find a windows machine to run some tests (i'm a mac user).

Actually, in my steps-to-reproduce above, I was using Word for Mac. So, you can reproduce this on a mac.

Could #67 be related also?

I'll try the binread patch and see if I can still reproduce the issue.

Could #67 be related also?

I'll try the binread patch and see if I can still reproduce the issue.

After applying this patch, I'm still able to reproduce this issue. Probably, the patch didn't help because I have no images in my template file. 🤦

I think I nailed it. Will be releasing 0.7.2 shortly

0.7.2 seems to work for me. I can remove my Rezipper class 🎉 Thanks Sandro.