/ucb_mids_w205_repo

ucb_mids_w205_repo

Primary LanguageJupyter Notebook

Moved to bCourses - all of the content in this repo has been moved to bCourses, and as of April 15, 2022, is no longer being updated. Please use bCourses for the most recent, up to date version. This repo will be deleted once we are sure we have moved everything to bCourses correctly.

UC Berkeley iSchool MIDS Program

w205 Fundamentals of Data Engineering

For the new students, welcome to Berkeley, welcome to the iSchool, and welcome to the MIDS program!

For all students, welcome to w205!

Our goal is to help you be extremely successful in this course, and this repo has a lot of resources to help you achieve that goal.

Slack

Our primary means of communication is done using slack.

Please join the channel datasci-205 if you have not done so already. Please use this channel to post any technical questions. All instructors and TAs monitor this channel, so you chances of receiving an answer are greater if you post here. Also, often, other students can help answer questions as well. If one student has an issue or question, chances are other students will have the same issue or question. So, other students can benefit from your questions, and likewise, you can benefit from theirs.

Most instructors prefer a slack direct message to an email. Please check with your instructor as to their preference.

Your instructor may also have a slack channel just for their sections.

Schedule and Due Dates

All dates and times are in US Pacific time.

Current time for US time zones, including Pacific: https://time.gov

Spring 2022

  • First Day of Classes
    • Monday 1/3/2022
  • Week 1 - Introduction to Data Engineering
    • Monday 1/3/2022 to Sunday 1/9/2022
    • Due: asynch assessment - additional week allowed since it's the first week
  • Week 2 - SQL Refresher
    • Monday 1/10/2022 to Sunday 1/16/2022
    • Due: week 1 asynch assessment before synch class starts
    • Due: week 2 asynch assessment before synch class starts
    • Reminder: project 1 is due in 3 weeks
  • Last Day to Add
    • Saturday, 1/15/20212
  • Week 3 - Linux CLI Refresher and GitHub gi CLI Refresher
    • Monday 1/17/2022 is a national holiday for Dr. Martin Luther King, Jr.'s birthday. You do not have to makeup this class, however, your instructor may schedule a makeup class that you may want to attend or watch the video.
    • Monday 1/17/2022 to Sunday 1/23/2022
    • Due: week 3 asynch assessment before synch class starts. For those in Monday classes, it will not be due until Tuesday at 11:59pm.
    • Reminder: project 1 is due in 2 weeks
  • Week 4 - Containers and Container Images
    • Monday 1/24/2022 to Sunday 1/30/2022
    • Due: week 4 asynch assessment before synch class starts
    • Reminder: project 1 is due in 1 week
  • Week 5 - Pipelines and Clusters of Containers
    • Monday 1/31/2022 to Sunday 2/6/2022
    • Due: week 5 asynch assessment before synch class starts
    • Due: project 1 at 11:59 pm Pacific on the day of your scheduled section synch class meeting
  • Week 6 - Data Wrangling, Part I: Common File Format I/O
    • Monday 2/7/2022 to Sunday 2/13/2022
    • Due: week 6 asynch assessment before synch class starts
    • Reminder: project 2 is due in 3 weeks
  • Week 7 - Data Wrangling, Part II: ETL (Extract, Transform, Load), ELT, and Data Cleansing
    • Monday 2/14/2022 to Sunday 2/20/2022
    • Due: week 7 asynch assessment before synch class starts
    • Reminder: project 2 is due in 2 weeks
  • Week 8 - NoSQL Graph Databases, Part I
    • Monday 2/21/2022 is a national holiday for President's Day. You do not have to makeup this class, however, your instructor may schedule a makeup class that you may want to attend or watch the video.
    • Monday 2/21/2022 to Sunday 2/27/2022
    • Due: week 8 asynch assessment before synch class starts. For those in Monday classes, it will not be due until Tuesday at 11:59pm.
    • Reminder: project 2 is due in 1 week
  • Week 9 - NoSQL Graph Databases, Part II
    • Monday 2/28/2022 to Sunday 3/6/2022
    • Due: week 9 asynch assessment before synch class starts
    • Due: project 2 at 11:59 pm Pacific on the day of your scheduled section synch class meeting
  • Week 10 - NoSQL Key-Value Databases
    • Monday 3/7/2022 to Sunday 3/13/2022
    • Due: week 10 asynch assessment before synch class starts
    • Reminder: project 3 GitHub repo deadline items should be completed in 3 weeks (counting spring break), 2 class weeks
    • Reminder: project 3 coding part is due in 4 weeks (counting spring break), 3 class weeks
    • Reminder: project 3 presentation part is due in 5 weeks (counting spring break), 4 class weeks
  • Week 11 - Web APIs, Part I
    • Monday 3/14/2022 to Sunday 3/18/2022
    • Due: week 11 asynch assessment before synch class starts
    • Reminder: project 3 GitHub repo deadline items should be completed in 2 weeks (counting spring break), 1 class week
    • Reminder: project 3 coding part is due in 3 weeks (counting spring break), 2 class weeks
    • Reminder: project 3 presentation part is due in 4 weeks (counting spring break), 3 class weeks
  • Spring Break
    • Monday 3/21/2022 to Sunday 3/27/2022
    • No classes
    • No office hours
    • Note that instructors have the option of using Monday 3/21/2022 as a make up date for one of the holidays. Most instructors will typically make up the holiday class in the same week to keep all sections in synch and not use Monday 3/21/22, however, please check with your instructor.
  • Week 12 - Web APIs, Part II
    • Monday 3/28/2022 to Sunday 4/3/2022
    • Due: week 12 asynch assessment before synch class starts
    • Reminder: project 3 GitHub repo deadline items should be completed by 11:59 pm Pacific on the day of your scheduled section synch class meeting
    • Reminder: project 3 coding part is due in 1 week
    • Reminder: project 3 presentation part is due in 2 weeks
  • Week 13 - Enterprise Message Queues, Data Lakes, and Serverless SQL
    • Monday 4/4/2022 to Sunday 4/10/2022
    • Due: week 13 asynch assessment before synch class starts
    • Due: project 3 coding part at 11:59 pm Pacific on the day of your scheduled section synch class meeting
    • Reminder: project 3 presentation part is due in 1 week
  • Week 14 - Business Intelligence and Data Warehousing
    • Monday 4/11/2022 to Saturday 4/16/2022
    • Due: week 14 asynch assessment before synch class starts
    • Due: project 3 in class presentations
  • Last Day of Classes
    • Saturday, April 16, 2022
  • Semester Letter Grades posted to Cal Central
    • Thursday, April 21, 2022 - your instructor may post them sooner, but no later

Office Hours

Office hours will be posted to the learning management system.

Instructors have the option to participate in pooled office hours where students from other sections can attend their office hours. If your section has been included in another instructor's office hours, you are allowed to attend.

TA office hours will be pooled. All sections can attend TA office hours.

Office Hours Spring 2022 Schedule

Note: Subject to change. Please check the slack channel and/or the ISVC meeting schedule for any changes. Office hours will not be held on holidays, nor during breaks.

Monday

  • Crook (pooled) - 5:35 pm - 7:30 pm
  • Chakraverty (pooled) - 5:30 pm - 6:30 pm
  • Gupta - 5:30 pm - 6:30 pm

Tuesday

  • Crook (pooled) - 5:35 pm - 7:30 pm

Wednesday

  • Mehra (pooled) 12 noon - 1:00 pm
  • Reid - 6:00 pm - 6:50 pm

Thursday

  • Schioberg (pooled) - 8:05 pm - 10:00 pm

Friday

  • Ad hoc - check ISVC

Saturday

  • Chakraverty (pooled) - 10:00 am - 11:00 am

Sunday

  • De Sola (pooled) - 10:00 am - 11:00 am

Grading

Item Percentage
Attendance and Participation 5%
Asynch Assessments 5%
Project 1 29%
Project 1 Acknowledge Feedback 1%
Project 2 29%
Project 2 Acknowledge Feedback 1%
Project 3 Coding 15%
Project 3 Presentation 15%
Total 100%

Grading Scale

Rounded to 2 decimal places:

From To Grade
93.00 100 A
90.00 92.99 A-
87.00 89.99 B+
83.00 86.99 B
80.00 82.99 B-
77.00 79.99 C+
73.00 76.99 C
70.00 72.99 C-
0.00 69.99 F

Instructors may award a grade of A+ in rare and exceptional cases. There is no set numeric range for A+. Even a score of 100 does not guarantee an A+. A and A+ count the same towards GPA. Typically it will be limited to 1 or 2 students per section. Instructors have full discretion to award an A+. Examples of criteria instructors may consider include, but are not limited to:

  • Students who always came to class prepared by working through the asynch, including all labs, prior to class.
  • Students who attended classes, were on time, stayed until the end, had their camera on, etc.
  • Students who were exceptional in their class participation. They were always very active in breakout sessions, often taking a leadership role. During whole class discussions after breakouts, they frequently contributed, often very insightful comments.
  • Students who started early on projects, as evidenced by early and frequent commits to their GitHub repo. They didn't wait until the last week to start a multi-week project.
  • Students with projects that stood out above and beyond other projects, typically with all objective points earned and very high subjective points awarded.
  • Students who took a very active role in the group project, typically a leadership position, as evidenced by an ability to work through and resolve any group conflicts without instructor involvement.

Attendance and Participation

Attendance and participation are 5% of your semester grade.

Attendance and participation are self reported under the UC Berkeley Honor Code using a Google Form.

Spring 2022 Form (must be logged into Berkeley account): Spring 2022 Attendance and Participation

Attendance Rubrics:

  • Graded 1 to 100
  • First 2 absences are automatically counted as excused
  • After 2 absences, instructor will determine if absence is excused or not
  • Excused absences that are made up, no points deducted
  • Unexcused absences that are made up, -5 point penalty
  • Excused or unexcused absences that are not made up, -10 points penalty
  • Official Berkeley holidays are excused absences and do not have to be made up

Classes can be made up by:

  • Watching a video recording of a class meeting and working through the breakouts on your own
  • By attending another section (with instructor approval)

Participation Rubrics:

  • Maximum penalty per class meeting: -3
  • Each item missing
    • With a reasonable explanation, no penalty
    • Without a reasonable explanation, penalty -1
  • Come to class on time
  • Stay until the instructor dismisses class (or if it runs over, until the official end of class time)
  • Camera on at least 90% of class time (81 minutes out of 90 minutes)
  • Microphone on mute unless you are speaking (except for breakout rooms where it is fine to keep the microphone on)
  • Actively participate in breakout rooms
  • Speak at least one during whole class discussions
  • Not monopolize more than your fair share of class time - give others a chance to speak, to express their ideas and opinions, and to ask questions
  • Be respectul of your fellow students and instructor

Asynch Assessments

Asynch assessments are 5% of your semester grade.

In the asynch, after each logical topic or concept, an assessment is given. They are multiple choice, and you get as many tries to answer each question until you get it right. If you answer a question wrong, it will give you feedback as to why it's wrong, and allow you to answer it again. So, as long as you complete the assessment, you should get 100%.

Each week's asynch assessments are due prior to that week's class start time. (Except for week 1, since it's the first week, the due date is relaxed to allow an extra week)

Projects

  • Project 1 will be an individual project in which you will analyze a dataset stored in a relational database using Python and SQL, and also provide data visualizations, and executive summaries backed up by data to answer questions from executives.
    • Individual
    • 29% of your semester grade is the project itself
    • 1% of your semester grade is reading and acknowledging the feedback you received
  • Project 2 will be an individual project focused on data wrangling. You will design and implement a data pipeline to take a nested JSON file, extract it, load it, cleansing it, etc. and prepare executive summaries for executives on your data pipeline.
    • Individual
    • 29% of your semester grade is the project itself
    • 1% of your semester grade is reading and acknowledging the feedback you received
  • Project 3 is a group project and has two parts:
    • Group
    • 15% of your semester grade is the coding part - you will design and code a NoSQL graph database.
    • 15% of your semester grade is the presentation part - you will design a high level, comprehensive data engineering solution and present it in the last class meeting, using all the different technologies you have learned about this semester. No coding is required for this part - it's high level design only.

Readings

Please see the directory readings for more information about the readings for this course.

FAQ and Troubleshooting Guides

Please see the directory frequently_asked_questions and the directory troubleshooting_guides.