Edinburgh-Chemistry-Teaching/Data-driven-chemistry

Discussing Bootstrapping / error in the Median, skew

ghutchis opened this issue · 1 comments

I'm not sure whether this is better suited for unit 5 or 6. Since non-Gaussian distributions come up fairly commonly in chemistry (e.g., Boltzmann) I think it's important to emphasize:

  • you may want to check the skew using scipy.stats.skew
  • the median may be a better "central metric" than the mean (e.g., income distribution if Jeff Bezos comes into the room)
  • to estimate the std. dev. of the median, you can't use np.std()
  • bootstrapping is a nice, simple way to get confidence intervals on median or other properties, e.g.

https://carpentries-incubator.github.io/machine-learning-novice-python/07-bootstrapping/index.html
https://github.com/ghutchis/chem1000/blob/main/lectures/09b-prob-stats.ipynb

I ended up putting it into Unit_07. I remember that when teaching 6 I just had enough material without overrunning, but unit 7 had a bit more breathing time. It fits well well after the plotting distributions recap as well.
Commit 4f8bad5 implements our proposed final version for this. I have not included boostrapping as this is something that is taught in a different course in a later year. The general idea is to avoid teaching concepts in chemistry and programming and tend to rely on the students' knowledge of stats, etc., taught in their first year in a maths course.