YotpoLtd/metorikku

'partition by' same values as input filenames (not folders)

Closed this issue · 6 comments

input:
folderA/20180201.csv
folderA/20180202.csv
folderA/20180203.csv

with single metorikku jar run want to get output in parquet of:
date=20180201/a.parquet
date=20180202/a.parquet
date=20180203/a.parquet

I'm not sure I understand. You have input that has a date as part of the path but it's not Hadoop style? "Date=..."

correct

It's possible to do it with something we call fileDateRange.
You can use it like this:

file_date_range:
      template: path/%s.csv
      date_range:
        format: yyyyMMdd
        startDate: 20180201
        endDate: 20180203

can 'file_date_range' be referenced as a column in the metric yaml? (ie select file_date_range) ?

No. It's not available as a column. You don't have anything in your data itself to indicate the date?

dont have :( only the filename