ofajardo/pyreadr

index of pandas dataframe is lost when writing to Rds

julibeg opened this issue · 2 comments

Perhaps it's a known limitation, but I didn't find it in the README.

When writing a pd.DataFrame to .Rds, the index is lost.

Example:
In Python

>>> import pandas as pd
>>> import numpy as np
>>> import pyreadr
>>> bla = pd.DataFrame(np.arange(12).reshape(4, 3), index=list('abcd'))
>>> bla
   0   1   2
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
>>> pyreadr.write_rds("bla.Rds", bla)

In R:

> bla = readRDS("bla.Rds")
> bla
  0  1  2
1 0  1  2
2 3  4  5
3 6  7  8
4 9 10 11

I'm on linux 64 bit with Python 3.8.6 (Anaconda), R 4.0.3, and pyreadr 0.4.0 (installed from conda).

Expected behavior:

That the rownames of the R dataframe are the index of the pandas dataframe (i.e. 'a', 'b', 'c', 'd').

Good catch!, I think the api from the C library I am using in the back currently does not allow to set the rownames when writing a dataframe.

I will therefore add it to the list of known limitations and open a ticket in the C library to ask for the implementation of this feature. We may eventually get it implemented some day =)

That would be nice, fingers crossed 🤞