Using Arrow to further speed up raw data I/O

Question

ghiggi opened this issue a year ago · 0 comments

Prework

Read and agree to the code of conduct.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
Runnable
Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.

Evaluate the benefits of using:

the engine="arrow" in read.csv to read the raw data using multithreading,
the arrow dtype backend introduced in pandas 2.0 to decrease the memory usage of string columns in pd.DataFrame

Please describe the performance issue.

How poorly does DISDRODB perform?