ofajardo/pyreadr

Jupiter notebook kernel dies every time use pyreadr.read_r

YubinXie opened this issue · 6 comments

Describe the issue
A clear and concise description of what the issue is.

To Reproduce
pyreadr.read_r('file')

Setup Information:
How did you install pyreadr?

pip

Platform ( linux)

Python Version
3.8.2
Python Distribution ( Anaconda)

It could relate to the file being not sparse matrixes when loading a sparse matrix data from R.

I am not able to reproduce this. Please provide a file or clear instructions on how to reproduce the issue otherwise I cannot hekp.

closed due to inactivity. Provide means to reproduce the issue before reopening.

Hi, sorry for the late reply. I cant share the private data yet, but it is actually no need to use the data to reproduce.

In R, if the data is sparse, it will save in an efficient way (from 100G to few MB). However, when we open this file in python, it does not convert it to sparse matrix automatically. This will cause memory issues. Hope this makes sense to you.

Ok I now I understand the problem but I want to see how exactly is the data represented. Please provide some R code producing such data. If course it must be dummy data and not sensitive one.
Btw, if you are working with matrices instead of dataframes, are you aware pyreadr is not reading the dimensions of matrices ?

Check this one: https://www.dropbox.com/s/63gnlw45jf7cje8/pbmc3k_final.rds?dl=0

scRNA seq is a very popular modern biomedical method, its data is usually 95%+ sparse...

The file contains an S4 object. If I read it with pyreadr I get this error:

pyreadr.custom_errors.LibrdataError: The file contains an unrecognized object

That means there is no support for it. Please read the README in the known limitations section (it is described there) and also here:

#51