This is the R script repository of the "Data Analysis 1a: Foundation of Data management in R" course, part of the MSc in Business Analytics at CEU. For the previous years, see the 2016 and 2017 Winter branches.

Table of Contents

Syllabus

Please find in the syllabus folder of this repository.

Technical Prerequisites

Although the required software is already installed on the computers in the School Lab, but if you plan to use your own laptop, please make sure to install the below items before attending the first class:

  1. Install R from https://cran.r-project.org
  2. Install RStudio Desktop (Open Source License) from https://www.rstudio.com/products/rstudio/download
  3. Register an account at https://github.com
  4. Enter the following commands in the R console (bottom left panel of RStudio) and make sure you see a plot in the bottom right panel and no errors in the R console:
install.packages('ggplot2')
library(ggplot2)
ggplot(diamonds, aes(cut)) + geom_bar()

Regarding coding style, I highly suggest reading the related chapter from Hadley Wickham's "Advanced R" book.

Optional steps I highly suggest to do as well before attending the class if you plan to use git:

  1. Bookmark, watch or star this repository so that you can easily find it later

  2. Install git from https://git-scm.com/

  3. Verify that in RStudio, you can see the path of the git executable binary in the Tools/Global Options menu's "Git/Svn" tab -- if not, then you might have to restart RStudio (if you installed git after starting RStudio) or installed git by not adding that to the PATH on Windows. Either way, browse the "git executable" manually (in some bin folder look for thee git executable file).

  4. Create an RSA key (optionally with a passphrase for increased security -- that you have to enter every time you push and pull to and from GitHub). Copy the public key and add that to you SSH keys on your GitHub profile.

  5. Create a new project choosing "version control", then "git" and paste the SSH version of the repo URL copied from GitHub in the pop-up -- now RStudio should be able to download the repo. If it asks you to accept GitHub's fingerprint, say "Yes".

  6. If RStudio/git is complaining that you have to set your identity, click on the "Git" tab in the top-right panel, then click on the Gear icon and then "Shell" -- here you can set your username and e-mail address in the command line, so that RStudio/git integration can work. Use the following commands:

    $ git config --global user.name "Your Name"
    $ git config --global user.email "Your e-mail address"
    

    Close this window, commit, push changes, all set.

Find more resources in Jenny Bryan's "Happy Git and GitHub for the useR" tutorial if in doubt or contact me.

Class Schedule

Full-time and part-time students attend the class on different weekdays, so below I will refer to only week numbers.

Week 1 (200 min): Introduction to R and General Programming

Week 2 (200 min): Introduction to Data Frames

You can find the recommended reading on data types and an overview of the hotels dataset uploaded to Moodle.

Also note, that as said in the class, data.table is not the only available data.frame extension, you could also use for example dplyr, see a quick overview in the related cheatsheet or the check the related DataCamp course that you can take for free until Jan 2018.

Week 3 (200 min): Data Transformations and Data Visualization

Week 4 (200 min): Text Parsing, Regular Expressions, Joins

Week 5 (200 min): Case Study on Bisnode Dataset

Please find the data and R scripts uploaded to Moodle.

Week 6 (100 min): Introduction to R Markdown and Shiny

Contact

File a GitHub ticket.