Partitioned parquet files
Opened this issue · 1 comments
alexflint commented
Hello! Thank you for this excellent library. Is there any way to read and write partitioned parquet files? I mean a series of parquet files organized something like this:
mydata/year=2020/month=1/day=1/6f0258e6c48a48dbb56cae0494adf659.parquet
mydata/year=2020/month=12/day=31/cf8a45116d8441668c3a397b816cd5f3.parquet
mydata/year=2021/month=2/day=28/7f9ba3f37cb9417a8689290d3f5f9e6e.parquet
In python this layout can be created like this:
import pandas as pd
# example dataframe with 3 rows and columns year,month,day,value
df = pd.DataFrame(data={'year': [2020, 2020, 2021],
'month': [1,12,2],
'day': [1,31,28],
'value': [1000,2000,3000]})
df.to_parquet('./mydata', partition_cols=['year', 'month', 'day'])
hangxie commented
This is feature from data frame, not a feature from parquet module, parquet-go does not have this feature, and I don't it should.