Converting csv 2 parquet as fast as I can, using rust.
Being forced to waste a lot of time just waiting my slow machine reading bunch of CSV files, I was just curious how much I can improve on this end.
Modern C++ is getting harder to catch-up. My C/C++ code style is more like just C + STL libraries, not leveraging recent C++ standards. It was easier to learn Rust from the scratch, instead of improving my poor C style I'm familiar with more than 10 years.
As current rust parquet implementation does not have buffered column writer like cpp version, I had to implement my own version of buffered column writer, and some copy-and-paste version of row group writer etc.
Currently buffer is just using vec[u8]
where a lot of copy will happen if
we write more and more data. Would be nice if I can find some linked list
buffer supporting append operation.