alan-turing-institute/rds-course

Post-delivery notes - Module 2 (2021 run)

Opened this issue · 7 comments

Taught Material

Section  Title  Start Time notes
Overview  13:05  
2.1.1 Where to find data 13:10 maybe more like 13:12?
2.1.2 Legality and Ethics 13:25  
2.1.3 Pandas intro 13:47 this felt like a very abrupt transition of topic, maybe more natural way  to do it?
SHORT BREAK 14:02  
2.1.4 Data sources & formats (in part) 14:10 up to and including databases (& didn't stop for exercise)
2.1.5 Controlling access 14:30  
2.2.1 Data consistency 14:35 did up to Null values before break
LONG BREAK 14:45
2.2.1 Data consistency 15:15 from Null values onwards
2.2.2 Modifying columns & indices 15:40  
2.2.3 Feature engineering (in part) 15:46 only  creating new features section (compute BMI with/without apply, compare exec time)
2.2.4.1 Time & Date skipped  
2.2.4.2 Text Data ?  
SHORT BREAK 16:05  
2.2.4.3 Categorical Data 16:10  
2.2.4.4 Image Data skipped  
2.2.5 Privacy & Anonymisation 16:30  
2.2.6 Linking Datasets 16:35  
2.2.7 Missing Data 16:41  
  Wrap-up (finial Qs, pre-reqs for hands-on) 16:51  
  End 16:55  

Hands-On

  • We had everyone in breakout rooms for the whole session essentially, and let people work at their own pace.
  • There were a couple of advanced people that worked through the material quite quickly, others we had quite a lot of discussion with about general/related topics. Most I think worked through the notebook and got up to the last couple of sections by the end, but probably didn't finish everything.
  • A couple of people felt a bit lost/left behind by the halfway mark. We let people move between rooms and gave some individual support to help.
  • Number of helpers felt about right (2 instructors & 3 helpers for 25-30 people)

@lannelin @triangle-man @ots22 @pwochner @LydiaFrance - I started to collate some notes/thoughts about the teaching of Module 2 here - additions welcome 🙂

ots22 commented

Thought it went well today, thanks for organizing.

Agree with all of the above.

Figuring out sort of traffic light/pulse-check/carpentries-style sticky note system would be very useful, especially one visible to the instructors across all breakout rooms (could be hard, but I'll include it for the wish-list!)

Zoom generally worked well, but only the host could see the calls for help, not the co-hosts.

@ots22 do you have any thoughts on how this would work? I think for M3 & M4 (M4 especially) it will be important to be able to keep track of people having difficulties. I suppose we could use the slack workspace?

It's really hard to teach students with wildly different levels of technical knowledge (which is going to be inevitable with a course like this).

  • One idea (for future development, not for next week) is to create several problems for each section, flagged as "good for beginners", "intermediate" and "hard". (I think the regex stuff, for example, would go in "hard").

  • In Gather, we could create different rooms ("easy", "intermediate" ...) and let people self-select. Maybe this is doable in Zoom?

  • An idea for the groups: ask the least-confident member to share their screen and drive, with other people making suggestions.

Based on requests, I think people would really like to see (and/or discuss) solutions. So maybe bring people back to plenary from time to time and ask for volunteers / walk through the model answers?

ots22 commented

@callummole I'm not sure. The benefit of the sticky notes seems to be that it gives a very low-friction way for somebody to say they are stuck or aren't ready to move on without calling for help. This avoids somebody getting left behind for too long before an instructor notices, and let's them see the overall status of the room.

Perhaps a quick status check (with something like Slido) at each break would be close.

  • A couple of people said it would be helpful to have a Pandas cheat sheet of helpful functions etc.
  • We should make it clear where the intent of the exercises was to look at the data documentation (rather than trying to answer everything using the loaded data frame)
  • Similarly, we should make it clear that extracting some of the answers manually is fine.
  • Say people can skip exercises/focus on the ones that appeal to them.

On James G's point around seeing solutions - I agree but we'd need to think about the best way to do it (with people working through the material at different paces etc.). One option could be to do this per-breakout room (so an instructor could go in and discuss the solutions for wherever that breakout room got to). But perhaps we should try to incorporate a discussion with the whole group at the halfway mark and at the end, for example.

Another thought - would this work better taught 50/50 teaching and exercises on each day? Then on the first day can make sure people have got to grips with Pandas basics.