PayneLab/cptac

Problem with xlrd >= 2.0.0

Closed this issue · 2 comments

Hello,

I made a fresh install of my OS, and after installing the package I got the following error trying to load one of the datasets:

>>> e = cptac.Endometrial()
Traceback (most recent call last):....          
  File "<stdin>", line 1, in <module>
  File "/home/mgarrido/.local/lib/python3.8/site-packages/cptac/endometrial.py", line 113, in __init__
    df = pd.read_excel(file_path)
  File "/home/mgarrido/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "/home/mgarrido/.local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 336, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)
  File "/home/mgarrido/.local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 1085, in __init__
    raise ValueError(
ValueError: Your version of xlrd is 2.0.1. In xlrd >= 2.0, only the xls format is supported. Install openpyxl instead.

The problem appears because of the newest version of xlrd, released on Dec 2020. Downgrading this package to its 1.2.0 version made the trick. Probably changing the required version of this dependency is more than enough.

Thanks again for this interface to the cptac data!

Thanks for letting us know. I'm working on finding the best way to resolve this, and it will be taken care of in the next release. It looks like pandas wants us to start using openpyxl for all .xlsx files, and to continue using xlrd for .xls files. But the switch to openpyxl is requiring a little troubleshooting :)

Okay, with the newest release this issue should be resolved. Just make sure you have the newest version of pandas (1.2.0).