/CAGEF

This repository contains CAGEF's education and outreach coding and bioinformatics lessons and workshops including CSB1020H/F and CSB1021H/S.

Primary LanguageHTML

Welcome to CSB1020H/F & CSB1021H/S - Introduction to R for Data Science!

CAGEF Training & Outreach Material by Erica Acton (erica.acton@utoronto.ca)

This repository is part of the Centre for the Analysis of Genome Evolution & Function's (CAGEF) bioinformatics training initiative. These courses and workshops were developed based on feedback of the needs and interests of the Department of Cell & Systems Biology and the Department of Ecology and Evolutionary Biology at the University of Toronto.


Course Information

Coordinators

Professor D. Guttman and Erica Acton

Offered

Winter 2019 - January 10 - February 20 (6 weeks)
Fall 2018 - September 18 - October 23 (6 weeks)

Weight

One module (0.25 FCE)

Time

Thursdays, 3:00 - 6:00pm

Location

St. George Campus, Earth Sciences Centre, Rm 3087

Description

This course is a beginner’s introduction to R and R-Studio for students who do not have a computer science background. It is intended for the student who wants to develop the skills to analyze his or her own data. Students who complete this course will be able to 1) be comfortable with the R-Studio environment, data structures and data types, 2) import data into R and manipulate data frames, 3) transform a ‘messy’ dataset into a ‘tidy’ dataset, 4) make exploratory plots, 5) use string manipulation to clean data, and 6) perform basic statistical tests and run a regression model. The structure of the class is ‘code-along’ and students are expected to bring a laptop.

Evaluation

Grades in this module will be determined by a combination of participation in in-class quizzes (6 x 5% = 30%), short assignments (5 x 10% = 50%), and a final project (20%). Short assignments require students to apply the material that they learned during each module with an emphasis on well-documented code that is concise. The final project brings together concepts from all modules by performing exploratory data analysis on a dataset of interest.

Pre-requisites

Access to a laptop computer to bring to class is REQUIRED with R and R-Studio installed (https://cloud.r-project.org/ and https://www.rstudio.com/products/rstudio/download/). There multiple choice questions on Socrative which requires an internet connection; a class key will be provided in class. Participation is required as part of your final grade.

As preparatory material for the course, students should install swirl (install.packages(‘swirl’)). When you have installed swirl, type library(swirl) and follow the prompts (ie. type what it tells you to type). From list 1 - R Programming, complete 1: Basic Building Blocks, 3: Sequences of Numbers, 4: Vectors, 7: Matrices and Data Frames.

Reading materials

A reference throughout the course will be R for Data Science (http://r4ds.had.co.nz/).

Website

All lesson materials and datasets for the course are found at https://github.com/eacton/CAGEF. Each lesson README page (linked to in 'Content' below) has a link to download the lesson folder. Assignments will be submitted to a course Dropbox.

Office Hours

By appointment: e-mail erica.acton@utoronto.ca to make an appointment.

Location: 25 Willcocks St, Room 4035


Content

Lesson 1 - Intro to R and R-Studio: Becoming Friends with the R Environment

Lesson 2 - Basic Life Skills: How to Read, Write, and Manipulate (Your Data)

Lesson 3 - Intro to Tidy Data: Go Long!

Lesson 4 - Of Data Cleaning and Documentation - Conquer Regular Expressions and Challenge yourself with a 'Real' Dataset

Lesson 5 - Plot all the things! From Data Exploration to Publication-Quality Figures

Lesson 6 - Linear Regression, Multiple Linear Regression, ANOVA, ANCOVA: Choosing the Best Model for the Job