/introduction-to-datascience

open source textbook for DSCI 100

Primary LanguageJupyter NotebookOtherNOASSERTION

Introduction to Data Science

This is the source for the Introduction to Data Science textbook.

Setup and Build

First, you need to ensure the following libraries/executables are available on your system (the below is for Ubuntu OS; if you aren't on Ubuntu, just run the below install.packages(...) command and it will throw errors telling you what to install for your OS)

sudo apt-get pandoc, pandoc-citeproc, libssl-dev, libxml2-dev, libfontconfig1-dev, libcairo2-dev

Then before rendering the book, you need to install a collection of R packages:

install.packages(c("e1071", "rvest", "tidyverse", "caret", "bookdown", "plotly", "gridExtra", "GGally", "svglite"))

Finally, you can render the book with the following R code:

bookdown::render_book('index.Rmd', 'bookdown::gitbook')

Style Guide

  • For R code block labels, use the format ##-[name with only alphanumeric + hyphens] where the ## is the 2-digit chapter number, e.g. 03-test-name for a label test-name in chapter 3

Repository Organization / Important Files

  • The files index.Rmd and ##-name.Rmd are R-markdown chapter contents to be parsed by Bookdown
  • _bookdown.yml sets the output directory (docs/) and default chapter name
  • img/ contains custom images to be used in the text; note this is not all of the images as some are generated by R code when compiling
  • data/ stores datasets processed during compile
  • docs/.nojekyll tells github's static site builder not to run Jekyll. This avoids Jekyll deleting the folder docs/_main_files (as it starts with an underscore)

License Information

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)