index of pandas dataframe is lost when writing to Rds
julibeg opened this issue · 2 comments
Perhaps it's a known limitation, but I didn't find it in the README.
When writing a pd.DataFrame
to .Rds
, the index is lost.
Example:
In Python
>>> import pandas as pd
>>> import numpy as np
>>> import pyreadr
>>> bla = pd.DataFrame(np.arange(12).reshape(4, 3), index=list('abcd'))
>>> bla
0 1 2
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
>>> pyreadr.write_rds("bla.Rds", bla)
In R:
> bla = readRDS("bla.Rds")
> bla
0 1 2
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
I'm on linux 64 bit with Python 3.8.6 (Anaconda), R 4.0.3, and pyreadr 0.4.0 (installed from conda).
Expected behavior:
That the rownames of the R dataframe are the index of the pandas dataframe (i.e. 'a', 'b', 'c', 'd').
Good catch!, I think the api from the C library I am using in the back currently does not allow to set the rownames when writing a dataframe.
I will therefore add it to the list of known limitations and open a ticket in the C library to ask for the implementation of this feature. We may eventually get it implemented some day =)
That would be nice, fingers crossed 🤞