Improve loading/parsing speed in 'arrowable' environment
heronshoes opened this issue · 2 comments
heronshoes commented
It takes a long time to read a large dataset from a source for the first time.
I created a fresh Docker environment for my dataframe example and found it very time consuming to pull a large dataset of nycflights13
.
If you use red-dataset-arrow
, the cache is stored in the arrow file, but the first time you load it, it takes a long time to load and parse because it uses Ruby's CSV.
Is it possible to make the environment extended with red-dataset-arrow use arrow to load and parse?
kou commented
How about adding Datasets::CSVParser
like Datasets::ZipExtractor
and extending Datasets::CSVParser
in red-datasets-arrow
?
heronshoes commented
Thanks @kou .
I will make a try to add Datasets::CSVParser
first!