/stat133-fall-2019

Course materials for Stat 133, Fall 2019, at UC Berkeley

Primary LanguageHTML

Stat 133: Concepts in Computing with Data


Calendar

  • Instructor: Gaston Sanchez
  • Lecture: MWF 9:00-10:00am, 245 Li Ka Shing
  • OH: MWF 10:30am - 11:30am, 309 Evans
  • Tentative topics and dates, subject to change depending on the pace of the course.
  • Notes (:pencil:) involves material discussed in lecture.
  • Reading (:book:) involves material that expands lecture topics, as well as coding examples that you should review/practice outside of class.

0. Course Introduction

  • πŸ“‡ Dates: Aug-28
  • πŸ’¬ Topics: Welcome to Stat 133. We begin with the usual review of the course policies, logistics, overall expectations, topics in a nutshell, etc.
  • πŸ“ Notes:
    • Welcome to Stat 133 (talk and chalk)
  • πŸ“– Reading:
  • πŸ”¬ Lab: No lab
  • πŸ”ˆ To Do:
    • Install R
    • Install RStudio Desktop (open source version, free)

1. The Big Picture


2. R Survival Skills


3. Intro to Data Technologies: Data Types, and Data Objects.

  • πŸ“‡ Dates: Sep 06-11
  • πŸ’¬ Topics: How do programming languages and computing environments handle data? To answer this question we'll discuss a couple of fundamental topics such as data types and their implementation in R around vectors and arrays. More specifically, we'll focus on concepts like atomicity, vectorization, recycling, and subsetting. Likewise, we will also describe more generic data objects such as lists.
  • πŸ“ Notes:
  • πŸ“– Reading:
  • πŸ’‘ Cheatsheet:

4. Intro to Data Technologies (cont'd): Data Frames

  • πŸ“‡ Dates: Sep 11-12, ⚠️ mostly in lab
  • πŸ’¬ Topics: Besides atomic data objects, we also need to talk about R data frames which provide a nice structure to handle tabular data. You will learn how to manipulate data frames from two approaches: 1) using classic bracket notation, and 2) using a more modern and syntactic way following the data plying framework provided by the package "dplyr".
  • πŸ“ Notes:
  • πŸ“– Reading:
  • πŸ’‘ Cheatsheet:

5. Housekeeping: Filesystem and Bash Commands


6. Data Tables: Storage, Organization, Importing, and Unix filters

  • πŸ“‡ Dates: Sep 18-25
  • πŸ’¬ Topics: We continue with a fundamental topic of data technologies: Data Tables, the most common form in which data is stored, handled, and manipulated. Because datasets in tabular format are so ubiquitous, we need to talk about how tables are typically stored, learn good principles of data organization, and the so-called notion of "tidy data". You will also learn how to perform basic manipulation of data-table files with some unix filters. Also, we'll examine the relationship between tables and R data frames, as well as some considerations when importing (and exporting) tables in R.
  • πŸ“ Notes:
  • πŸ“– Reading:
  • πŸ’‘ Cheat sheet:

7. Housekeeping: Version Control with Git and GitHub

  • πŸ“‡ Dates: Oct -02-03, ⚠️ mostly in lab
  • πŸ’¬ Topics: We continue talking about filestructure topics, and we introduce basic notions of version control systems (VCS) using Git, and the companion hosting platform GitHub.
  • πŸ“ Notes:
  • πŸ“– Reading:
    • Read sections 4 to 9 in Part I Installation (Happy Git and GitHub for the useR by Jenny Bryan et al.)
  • πŸ’‘ Cheat sheet:

8. Data Visualization


9. Transition to Programming Basics for Data Analysis (part 1)

  • πŸ“‡ Dates: Oct 14-18
  • πŸ’¬ Topics: You don’t need to be an expert programmer to be a data scientist, but learning more about programming allows you to automate common tasks, and solve new problems with greater ease. We'll discuss how to write basic functions, the notion of R expressions, and an introduction to conditionals.
  • πŸ“ Notes:
  • πŸ“– Reading:

10. Programming Basics for Data Analysis (part 2)

  • πŸ“‡ Dates: Oct 21-25
  • πŸ’¬ Topics: In addition to writing functions to reduce duplication in your code, you also need to learn about iteration, which helps you when you need to do the same operation several times. Namely, we review control flow structures such as for loops, while loops, repeat loops, and the apply family functions.
  • πŸ“ Notes:
  • πŸ“– Reading:

11. Testing Functions


12. Shiny Apps


13. More Shiny Apps and Introduction to Regular Expressions


14. More Regular Expressions

  • πŸ“‡ Dates: Nov 11-15
  • πŸ’¬ Topics: At its heart, computing involves working with numbers. However, a considerable amount of information and data is in the form of text. To unleash the power of strings manipulation, we need to take things to the next level and learn about Regular Expressions. Namely, Regular expressions are a tool that allows us to describe a certain amount of text called "patterns". We'll describe the basic concepts of regex and the common operations to match text patterns.
  • πŸ“ Notes:
  • πŸ“– Reading:
  • πŸ’‘ Cheat sheet:

15. R packaging (part 1)

  • πŸ“‡ Dates: Nov 18-22
  • πŸ’¬ Topics: Packages are the fundamental units of reproducible R code. They include reusable functions, the documentation that describes how to use them, and sample data. In this part we'll start describing how to turn your code into an R package.
  • πŸ“ Notes:
  • πŸ“– Reading:
  • πŸ’‘ Cheat sheet:

16. R Packaging (part 2)

  • πŸ“‡ Dates: Dec 02-06
  • πŸ’¬ Topics: Creating an R package can seem overwhelming at first. So we'll keep working on the creation of a relatively basic package. This will give you the opportunity to apply most of the concepts seen in the course.
  • πŸ“ Notes:
  • πŸ“– Reading:
  • πŸ’‘ Cheat sheet:

17. RRR Week and Final Exam

  • πŸ“‡ Dates: Dec 09-13
  • πŸ’¬ Topics: Prepare for final examination
  • πŸ“ Notes:
    • No lecture. Instructor will hold OH (in 309 Evans)
  • πŸŽ“ FINAL: Dec-19th, 7-10 pm, room TBD
    • More details about the final will be posted on bCourses