stillmatic/quiltr

Parquet support

stillmatic opened this issue · 0 comments

ideally via first-class Arrow support in R (see https://github.com/clarkfitzg/Rarrow)

currently processing parquet client side, with inherent dependencies on python

python -c 'import pyarrow.parquet as pq; import sys; table = pq.read_table(sys.argv[1]); df = table.to_pandas(); df.to_csv(sys.argv[1] + ".csv")' "~/quilt_packages/objs/7ca9a61d0f18e5121a8fbe72cc384335a8025d79736edaa82efea4984bfb97c4"

this is obviously ugly; a proper python script would be like:

import pyarrow.parquet as pq
import sys


def convert(filename):
    table = pq.read_table(filename)
    df = table.to_pandas()
    # write to same location, with csv appended
    df.to_csv(filename + ".csv")

if __name__ == "__main__":
    convert(sys.argv[1])