Output .splits files along with .bson
Closed this issue · 1 comments
jfelectron commented
.splits files are used by Hadoop/Mongo-Hadoop at M/R run-time. Without these, it seems even with
settings config.set("bson.split.read_split","false"), the files are re-split which adds unneeded overhead.
taganaka commented
You can use -f
to override the default filename output.
-f
accepts a prinf-like string such as foo-%d.split
. %d
will be replaced with the current split number
I'll add it to the readme file
Cheers