ofajardo/pyreadr

read RData from a Github url?

MAGALLANESJoseManuel opened this issue · 4 comments

I have data here:

How can I read it using pyreadr?
fullLink='https://github.com/EvansDataScience/data/raw/master/crime.RData'

Read the csv using urllib2, write it to a file and then pass the path to the file to pyreadr. Something like this but instead od reading it, save to file:

https://stackoverflow.com/questions/16283799/how-to-read-a-csv-file-from-a-url-with-python

The underlying C library absolutely requires a file on disk, therefore pyreadr also requires it. I leave to the user to decide how to download the file.

NOT WORKING:

import pyreadr
from urllib.request import urlopen
link="https://github.com/EvansDataScience/data/raw/master/crime.RData"
response = urlopen(link)
result = pyreadr.read_r(response)
print(result.keys())

GIVES ME:

File "", line 10, in
result = pyreadr.read_r(response)

File "/Users/JoseManuel/anaconda3/envs/bookVisualDS/lib/python3.7/site-packages/pyreadr/pyreadr.py", line 40, in read_r
parser.parse(path)

File "pyreadr/librdata.pyx", line 117, in pyreadr.librdata.Parser.parse

File "pyreadr/librdata.pyx", line 134, in pyreadr.librdata.Parser.parse

AttributeError: 'HTTPResponse' object has no attribute 'encode'

As I mentioned before you have to write it to a file, something like this (not tested):

import pyreadr
from urllib.request import urlopen
link="https://github.com/EvansDataScience/data/raw/master/crime.RData"
response = urlopen(link)
fhandle = open('file.RData', 'wb')
fhandle.write(response)
fhandle.close()
result = pyreadr.read_r("file.RData")
print(result.keys())

OK, here the tested and working version:

import pyreadr
from urllib.request import urlopen
link="https://github.com/EvansDataScience/data/raw/master/crime.RData"
response = urlopen(link)
content = response.read()
fhandle = open('file.RData', 'wb')
fhandle.write(content)
fhandle.close()
result = pyreadr.read_r("file.RData")
print(result.keys())