Configurable block_len
Closed this issue · 3 comments
I am launching approx 96 concurrent readers to load 550 CSV files in parallel. The memory consumption is quite large due to the block_len size:
class LineReader{
private:
static const int block_len = 1<<24 // 16 meg
I reduced it 1<<20 (1 meg) and still get good throughput. Just a suggestion to make this a configurable parameter for LineReader if possible? I might be an edge case though :-)
Hi,
thanks for the suggestion.
Making it configurable seems like an esoteric feature that at the end of the day maybe two users will ever use. I think it is better, if those two users just change the value in the header as you did.
However, a smaller default value is probably a good idea. I'll leave this issue open until I have found the time to do some benchmarking. Unfortunately, good benchmarking is tough as hard disk, ssd, network store will probably all have different characteristics.
I finally got around doing a benchmark. I did not see a measurable effect of reducing block_len to 1<<20. The value on master is therefore now 1<<20. :)
Excellent! Thanks :-)