'partition by' same values as input filenames (not folders)
Closed this issue · 6 comments
tooptoop4 commented
input:
folderA/20180201.csv
folderA/20180202.csv
folderA/20180203.csv
with single metorikku jar run want to get output in parquet of:
date=20180201/a.parquet
date=20180202/a.parquet
date=20180203/a.parquet
lyogev commented
I'm not sure I understand. You have input that has a date as part of the path but it's not Hadoop style? "Date=..."
tooptoop4 commented
correct
lyogev commented
It's possible to do it with something we call fileDateRange.
You can use it like this:
file_date_range:
template: path/%s.csv
date_range:
format: yyyyMMdd
startDate: 20180201
endDate: 20180203
tooptoop4 commented
can 'file_date_range' be referenced as a column in the metric yaml? (ie select file_date_range) ?
lyogev commented
No. It's not available as a column. You don't have anything in your data itself to indicate the date?
tooptoop4 commented
dont have :( only the filename