pinterest/secor

Tune file size create by Secor

lukass77 opened this issue · 0 comments

Hello ,
I have a question , I am working on upgrade secor version to be the secor master branch ..
with that I succeed to run "Dev" env ,where I write all output file .gzip to s3 "Dev" bucket ..
what I saw that secor create lot of small files ..in s3 bucket ..

question - how can I tune secor to create bigger size files and as a result of this will have less files number for the same events fetch from kafka ..

for example if reading 1000 events from topic .. , it can be split to 100 files of 10 events each file , or it can be split to 50 files where each file contains 20 events ..

I saw there is "secor.max.file.age.seconds" and "secor.max.file.size.bytes" properties .. but as far as I understood those impact the frequency ("when") the uplaod will be called .. but not the crated file size on disk ..

I am not sure I fully understand how secor "know" when to stop write the current file and start create the new file ..
is it upon file size , kafka batch .. or other crietira ? , file rolling policy ?
how can I control the size of a "single" output log file

Thanks in advance !!
Nir