/beyond-single-core-R

Short tour of parallel and foreach packages, and how to think about scaling data analyses

Primary LanguageROtherNOASSERTION

Beyond Single Core: Parallel Analysis in R

R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it's not clear what to do next.

This is material for a short overview of scalable data analysis in R. The slides can be viewed at https://ljdursi.github.io/beyond-single-core-R .

It covers:

  • How to think about parallelism and scalability in data analysis
  • The standard parallel package, including what was the snow and multicore facilities, using airline data as an example
  • The foreach package, using airline data and simple stock data;
  • A summary of best practices.

Included in the materials, though not in the talk, are some more advanced methods:

  • The bigmemory package for out-of-core computation on large data matrices, with a simple physical sciences example;
  • The Rdsm package for shared memory; and
  • a brief introduction to the powerful pbdR pacakges for extremely large-scale computation.