ben-strasser/fast-cpp-csv-parser

Configurable block_len

Closed this issue · 3 comments

I am launching approx 96 concurrent readers to load 550 CSV files in parallel. The memory consumption is quite large due to the block_len size:

   class LineReader{
    private:
        static const int block_len = 1<<24  // 16 meg

I reduced it 1<<20 (1 meg) and still get good throughput. Just a suggestion to make this a configurable parameter for LineReader if possible? I might be an edge case though :-)

Hi,

thanks for the suggestion.

Making it configurable seems like an esoteric feature that at the end of the day maybe two users will ever use. I think it is better, if those two users just change the value in the header as you did.

However, a smaller default value is probably a good idea. I'll leave this issue open until I have found the time to do some benchmarking. Unfortunately, good benchmarking is tough as hard disk, ssd, network store will probably all have different characteristics.

I finally got around doing a benchmark. I did not see a measurable effect of reducing block_len to 1<<20. The value on master is therefore now 1<<20. :)

Excellent! Thanks :-)