/MAST30034_R

R tutorial files

Primary LanguageHTML

Welcome to the MAST30034 R Repo

  • Author: Yue You
  • Tutorial Up-to-Date as of: 2021
  • Usage: For MAST30034 students only ...

This repository will house all R workshops.

The python stream is available here.

Dates and Times

On Campus:

  • Monday 13:15 - 15:15 (R - Yue)
  • Tuesday: 14:15 - 16:15 (Python - Akira)
  • Wednesday: 11:00 - 13:00 (Python - Akira)
  • Thursday: 10:00 - 12:00 (Python - Calvin)

Online:

  • Tuesday: 16:15 - 18:15 (R - Yue)
  • Wednesday: 14:15 - 16:15 (Python - Calvin)
  • Thursday: 13:00 - 15:00 (Python - Akira), 15:15 - 17:15 (Python - Akira)

Tutorials

The first few tutorials will have content, with the remainder of the semester treated as consultations or additional tutorials as outlined:

  1. Introduction and Project 1 Overview:

    • Using the JupyterHub server
    • Using GitHub Desktop vs Git CLI (Command Line Interface)
    • Project 1 Overview
    • R Revision
    • Data Serialization
    • Downloading Files using R
    • Advanced: spark Installation
  2. Geospatial Visualization and Analysis:

    • HexBins (vs SquareBins), Choropleths.
    • Descriptive statistics
    • Advanced: sparklyr data analysis
  3. Regression and Discussion:

    • Linear Regression
    • MSE vs R-Squared
    • Penalized Regression (LASSO and Ridge)
    • Generalized Linear Model example (Poisson for count data)
    • Advanced: sparklyr modeling
  4. Machine Learning and Working as a Team:

    • Discussion: Overfitting, Curse of Dimensionality, Feature Engineering, etc.
    • Dimensionality Reduction
    • Agile Methodology + Standups
  5. Project 2 Overview

    • Introduction of themes
    • Getting into teams
    • Assessment Overview

Project 2 Tutorials (Week 6 - 12)

  • Attendance is mandatory. Groups are excused one absence only.
  • The last 2 weeks of tutorials will be Presentations, all groups must attend a designated tutorial.
  • The remainder of tutorials will act as checkpoints, consultation, and a chance for your group to conduct standups at a fixed time slot.

R Libraries Covered

Statistical Modeling / Machine Learning:

  • glmnet

Data Engineering / End-to-End Pipelines:

  • dplyr, sparklyr

Visualizations:

  • ggplot2, pheatmap,corrplot, ggmap, tmap

.....