Parquet support
stillmatic opened this issue · 0 comments
stillmatic commented
ideally via first-class Arrow support in R (see https://github.com/clarkfitzg/Rarrow)
currently processing parquet client side, with inherent dependencies on python
python -c 'import pyarrow.parquet as pq; import sys; table = pq.read_table(sys.argv[1]); df = table.to_pandas(); df.to_csv(sys.argv[1] + ".csv")' "~/quilt_packages/objs/7ca9a61d0f18e5121a8fbe72cc384335a8025d79736edaa82efea4984bfb97c4"
this is obviously ugly; a proper python script would be like:
import pyarrow.parquet as pq
import sys
def convert(filename):
table = pq.read_table(filename)
df = table.to_pandas()
# write to same location, with csv appended
df.to_csv(filename + ".csv")
if __name__ == "__main__":
convert(sys.argv[1])