An attempt to construct a heavily opinionated, tightly curated dataset of core knowledge. Focus will be textbooks, but may also include lecture transcripts/slides, course notes, etc.
Goal is to build an extremely information-dense dataset for model finetuning in support of STEM research activities.
Links will be preferntially to PDFs for standardization. Source code should be linked where available, but for now I'm operating under the assumption that everything will need to go through a PDF processing pipeline.
My main focus here is colelcting materials for machine processing, but I am also including "portal" pages to e.g. help human learniners find associated course content and/or more human-readable presentation.
-
2021 - The Principles of Deep Learning Theory
-
2021 - Geometric Deep Learning
- https://arxiv.org/pdf/2104.13478.pdf
- source - https://arxiv.org/src/2104.13478
- portal - https://geometricdeeplearning.com/
- lectures and slides - https://geometricdeeplearning.com/lectures/
-
2017 - Elements of Statistical Learning
-
2022 - Probabilistic Machine Learning: An Introduction (Bishop 1)
-
2023 - Probabilistic Machine Learning: Advanced Topics (Bishop 2)
-
CS228 - Probabilistic Graphical Models
-
2021 - Bayesian Statistics With Julia and Turing
-
Make A Lisp
- https://github.com/kanaka/mal
- NB: textbook has been re-implemented in 87 different programming languages. Coding Rosetta Stone.
-
2013 - Probabilistic Programming and Bayesian Methods for Hackers
-
1992 - Paradigms of AI Programming (Norvig)
-
Category Theory for Programmers
-
1996 - structure and interpretation of computer programs (SICP, Abelman and Sussman)
-
2021 - An Introduction to Johnson–Lindenstrauss Transforms
Needs Categorization
Putting these all under a single bucket because I anticipate there to be a lot of repeated content across these.
-
2023 - UVA Deep learning
-
2023 - Dive Into Deep Learning
-
2020 - FastAI Book
-
2023 - Understanding Deep Learning (Prince)
-
2020 - An Elementary Introduction to Information Geometry
If it's in this section, it's probably only licensed for personal use or the licensing is sufficiently ambiguous that you may not want to train on content in this section. Double check before using any of these.
-
2023 - Bayesian Optimization
-
2016 - Deep Learning (Goodfellow, Bengio, Courville)
-
2021 - Numerical Methods for Differential Equations with Python
- No license listed for certain repos, others are MIT licensed. Probably fine, putting it in limited license section just due to the ambiguity.
- https://johnsbutler.netlify.app/files/Teaching/Numerical_Analysis_for_Differential_Equations.pdf
- https://github.com/john-s-butler-dit/Numerical-Analysis-Python
- https://github.com/john-s-butler-dit/NumericalAnalysisBook
-
2003 - Information Theory, Inference, and Learning Algorithms
-
2008 - Graphical Models, Exponential Families, and Variational Inference
- https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf
- alt - https://statistics.berkeley.edu/tech-reports/649
- source - https://statistics.berkeley.edu/sites/default/files/tech-reports/649.ps.Z
- Published as a paper, publisher doesn't permit downlaod w/o subscription. Licensing general unclear.
-
2012 - Bayesian Reasoning And Machine Learning
-
2019 - Applied Stochastic Differential Equations
-
Physically Based Rendering
-
Sutton and Bartow RL
-
Linear Algebra Done Right (Axler)
-
2004 - Lectures on Differential Geometry
-
2022 - Intro to Differential Geometry I (Salamon)
- https://people.math.ethz.ch/~salamon/PREPRINTS/diffgeo.pdf
- Lots more lecture notes in book form - https://people.math.ethz.ch/~salamon/
-
2022 - Intro to Differential Geometry II (Salamon)
-
2020 - Measure and Integration (Salamon)
-
2015 - Functional Analysis (Salamon) *https://people.math.ethz.ch/~salamon/PREPRINTS/funcana-ams.pdf
-
2023 - Discrete Differential Geometry
-
2023 - Stanford Encyclopedia of Philosophy
- https://huggingface.co/datasets/dmarx/stanford-encyclopedia-of-philosophy_dec23
- I think if I reach out to the SEP and ask they'd be cool with this kind of application of their data, but I'm not confident this falls under their current terms of use. Buyer beware.
-
https://github.com/stars/dmarx/lists/pedagogical - This project is largely seeded by the content from this list
-
https://github.com/labmlai/annotated_deep_learning_paper_implementations
-
https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap
-
8 part (8 papers) RL course with code demos for each - https://github.com/Curt-Park/rainbow-is-all-you-need
-
Lots of courses (youtube), last updated Nov 2022 https://deep-learning-drizzle.github.io/
-
https://github.com/irregular-rhomboid/EAI-Math-Reading-Group
- Topology
- Info Geo
- Strogatz
- Allen B Downey
- Graph Theory
- Network Theory
- Ecology
- Systems Theory
- Risk Management
- Emergency Management
- Climate Science
- Molecular Biology
- Vector Calculus
- Statistical Mechanics
- Complexity
- Network Science
- Social Networks
- Game Theory
- Physical Chemistry