ofajardo/pyreadr

Error: Unable to load time variables with missing values in python using pyreadr package from RData file

PawanRamaMali opened this issue · 2 comments

I want to execute some python functions using data from '.RData' file. I am using the 'pyreadr' python package for the same.
However I am getting an error when there are missing values in a data frame.

Steps to reproduce the behavior.

Here is example of R Code

library(data.table)
# Creating demo data frame
data <- data.table(x_time = c(Sys.time(),Sys.time()+1,Sys.time()+2))
data_missing <- data.table(x_time = c(Sys.time(),NA,NA))

# checking the classes
sapply(data,class)
sapply(data_missing,class)

# Storing the data in RData file 
save(data, file = "test_data.RData")
save(data_missing, file = "test_missing_data.RData")

The reason I am storing it in different files is because the 'test_data.RData' is successfully loaded in python, however the 'test_missing_data.RData' is giving the an error.

Here is the Python Code

##  Working demo
# import pyreadr
# result=pyreadr.read_r('test_data.RData')
# data=result['data']
# data.dtypes
# print(data)

### Error in below 

import pyreadr
result=pyreadr.read_r('test_missing_data.RData') # Error 
data=result['data']
data.dtypes
print(data)

The error message is as below:

C:\Users\Pawan\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\pandas\core\tools\datetimes.py:530: RuntimeWarning: invalid value encountered in multiply arr, tz_parsed = tslib.array_with_unit_to_datetime(arg, unit, errors=errors)

The error occurs when there are NA values in the data frame. Am I missing something here and is there any other way load RData files in python ?

Thank you for your time and help.

Setup Information:
How did you install pyreadr? - pip
Platform (windows, macOS, linux, 32 or 64 bit) - Windows 64bit
Python Version - 3
Python Distribution (System, plain python, Anaconda) - Reticulate Python
I had also asked the same question on stackoverflow - https://stackoverflow.com/questions/73211508/error-unable-to-load-time-variables-with-missing-values-in-python-using-pyreadr

It is just a warning, not an error, this code works well for me:

import pyreadr
result=pyreadr.read_r('test_missing_data.RData') # No error, just warning
# Your data frame is called data_missing, not data, since you called like that in your R code,
# I think this is what you are doing wrong
# Check data.keys() to see what you have if you are not sure
data=result['data_missing']
data.dtypes
#x_time    datetime64[ns]                                                                                                                                                                              
#dtype: object
print(data)
#                       x_time                                                                                                                                                                       
#0 2022-08-03 09:37:55.963370752                                                                                                                                                                       
#1                           NaT                                                                                                                                                                       
#2                           NaT 

# Looks correct to me

Why is that warning arising? I don't know. I will take a look when I get some time.

Thank you for your response. You are right, it is just a warning. I am having multiple issues so I initially thought it to be an error.