xitongsys/parquet-go

Partitioned parquet files

Opened this issue · 1 comments

Hello! Thank you for this excellent library. Is there any way to read and write partitioned parquet files? I mean a series of parquet files organized something like this:

mydata/year=2020/month=1/day=1/6f0258e6c48a48dbb56cae0494adf659.parquet
mydata/year=2020/month=12/day=31/cf8a45116d8441668c3a397b816cd5f3.parquet
mydata/year=2021/month=2/day=28/7f9ba3f37cb9417a8689290d3f5f9e6e.parquet

In python this layout can be created like this:

import pandas as pd

# example dataframe with 3 rows and columns year,month,day,value
df = pd.DataFrame(data={'year':  [2020, 2020, 2021],
                        'month': [1,12,2], 
                        'day':   [1,31,28], 
                        'value': [1000,2000,3000]})

df.to_parquet('./mydata', partition_cols=['year', 'month', 'day'])

This is feature from data frame, not a feature from parquet module, parquet-go does not have this feature, and I don't it should.