taganaka/bson-splitter-c

Output .splits files along with .bson

Closed this issue · 1 comments

.splits files are used by Hadoop/Mongo-Hadoop at M/R run-time. Without these, it seems even with
settings config.set("bson.split.read_split","false"), the files are re-split which adds unneeded overhead.

You can use -f to override the default filename output.

-f accepts a prinf-like string such as foo-%d.split. %d will be replaced with the current split number

I'll add it to the readme file

Cheers