Learning Statistics

I'm learning statistics by reading the book Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python (2nd ed.) by Peter Bruce, Andrew Bruce & Peter Gedeck.

The idea of this repo is to host a modified version of the R code provided in the book and in the official GitHub repo.

So why have my own version of the code for something already available through that link I shared?

  1. Fixing Broken Code: Some of the code that is repo only (as in not in the book) had obvious bugs in it. References to missing variables etc. The code I have here works (at least at the time of writing, you know how it is).
  2. Readability: I take readability seriously. I try to adhere to the Tidyverse Style Guide, but I don't claim to be perfect about it. Always room to grow!
  3. Tidyverse: Although some code is already written with {tidyverse}, I've translated the rest of the base R code to a tidier format. Where it was possible and/or made sense, that is. I've taken special care to recreate the plots using {ggplot2} instead of the plot() function from base R.

Although I like the {tidymodels} framework, I decided not to use it for the Machine Learning (ML) examples in this book. You know, to keep the focus on the statistics part. But if you're interested in {tidymodels}, check out my other repo, Learning Machine Learning.

Disclaimer!

This repo is not meant to replace the book in any way. You should definitely read the book. It will help you understand the concepts much better than looking at the code or playing with it.

Also, I would recommend you buy the book. Here's a direct link to O'Reilly's (the publisher) website.