add arrow/parquet as alternative in Chapter 5
engineerchange opened this issue · 4 comments
Saw a debate on using feather vs. arrow's parquet online, and it seems like it is a viable alternate in efficiency and worth benchmarking in Chapter 5: Efficient input/output.
Hi @engineerchange, first I'd like to say: many thanks for keeping this repo lively, your agitating for more computationally efficient implementations is greatly appreciated from the perspective of updating the book (and perhaps from the perspective of engineering positive change in the world beyond computing)!
Have you seen any benchmarks comparing parquet vs vroom, and do you know if the R implementation, which seems slower than the Python implementation in Wes's tests, has sped up?
For me the only question is 'when' not 'if': has the R implementation reached a sufficient level of maturity to be worthy of inclusion in the book? On a related note I'd like to add duckdb
to the book, assuming it's ready.
That's a very kind note from you - I use your book as a cheatsheet like most, so I am happy to hear that my agitations are appreciated! 😅
I don't have many answers here except to suggest it as an option. I struggle to know when a package has reached a level of maturity that would be appropriate for a publication like this. I think this effort would be a good way to document some benchmarks, however.
Yeah, duckdb
is quickly moving into ⭐ status in the R world, and including it is probably a good idea. I was poking around with it a bit this weekend; and I think it's likely the best way to introduce SQL to an R user, and likely to someone brand new to coding.
Fantastic. Well... in the interests of keeping our giant 'cheat sheet' up-to-date, any further comments and especially suggested changes via PRs, are very welcome ;)
Heads-up @engineerchange I've created this PR that aims to compare vroom and arrow options: #293
Work in progress, comments on or additions to that welcome!