tidyverse/readxl

Reading CDFV2 MS Excel

msgoussi opened this issue · 1 comments

When I visit http://rtais.wto.org/UI/PublicMaintainRTAHome.aspx and click on “Export all RTAs”, I get a file called “AllRTAs.xls”.
However this file can not be read using readxl package and i am getting error message ( libxls error: Unable to open file)

readxl::read_excel("AllRTAs.xls")

I have checked the file type online (https://www.checkfiletype.com/) and i get File Type: CDFV2 Microsoft Excel
MIME Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Suggested file extension(s): xlsx

I realized that there is another r package (gdata::read.xls("AllRTAs.xls")) works fine.

This is appears to be a very old Excel file format

Specifies the Excel Binary File Format (.xls) Structure, which is the binary file format used by Microsoft Excel 97, Microsoft Excel 2000, Microsoft Excel 2002, and Microsoft Office Excel 2003.

https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/cd03cb5f-ca02-4934-a391-bb674cb8aa06

Apparently so old that libxls, which we use internally, does not support it.

I even tried to read it with xls2csv, a standalone tool for using libxls, and it does not work. So that puts it beyond the reach of readxl.