ofajardo/pyreadr

bzip2-compressed RData object results in "unsupported compression scheme" error

alexhbnr opened this issue · 5 comments

Hi,

I have a large RData file with multiple different data frames and matrices and therefore I compressed it using bzip2 when using save(..., compress="bzip2"). If I use the function load() in R, the RData file is read without problems. However, pyreadr complains with a LibrdataError that "The file is compressed using an unsupported compression scheme".

To Reproduce:
In R:

x <- tibble(id = seq(1, 100), cat = sample(c("A", "B", "C", "D"), 100, replace = T))
save(x, file = "tmp.RData", compress = "bzip2")

In Python

import pyreadr
pyreadr.read_r("tmp.RData")

Expected behavior:
From what I read in pyreadr/libs/librdata/src/rdata_read.c, reading bzip2-compressed RData object should be possible, or am I mistaken? If pyreadr cannot read compressed RData objects because the underlying librdata module doesn't allow it, it should be stated in the known limitation section.

Setup Information:
How did you install pyreadr? pip
Platform linux, 64 bit
Python Version: 3.7.3
Python Distribution Miniconda

Thanks,
Alex

thanks for the report and the way to reproduce it.

Probably a flag is misssing in setup.py to compile with BZIP, my guess is that

extra_compile_args = ['-DHAVE_ZLIB']

should be

extra_compile_args = ['-DHAVE_ZLIB', '-DHAVE_BZIP2' ]

or something like that.

I'll check when I get some time and put your code in the tests. It will take me some time as I also have to fix it for windows.

In the meantime, if you are in a hurry you can try to compile with that flag (and having the library bzip2 installed) to see if that works.

working in linux in branch dev_bzip2. Probably would work on mac as well. It can be used already if compiled manually on those systems.
Now starts the long journey of getting the wheels to get produced correctly in travis CI and get it to work on windows ...

wheels for linux and mac already available here: https://anaconda.org/ofajardo/pyreadr/files

Thank you @ofajardo for the fast reply and fix. After installing the latest version using the wheels files, I can now load the bzipped RData files without any problems.

changes now available on pypi and conda, version 0.2.8 . Windows working as well.