/textbooks-dataset

Curated dataset of open source textbooks

MIT LicenseMIT

Curated Open Textbooks

An attempt to construct a heavily opinionated, tightly curated dataset of core knowledge. Focus will be textbooks, but may also include lecture transcripts/slides, course notes, etc.

Motivation

Goal is to build an extremely information-dense dataset for model finetuning in support of STEM research activities.

Links will be preferntially to PDFs for standardization. Source code should be linked where available, but for now I'm operating under the assumption that everything will need to go through a PDF processing pipeline.

My main focus here is colelcting materials for machine processing, but I am also including "portal" pages to e.g. help human learniners find associated course content and/or more human-readable presentation.

Textbooks

Needs Categorization

Misc Intro Deep Learning

Putting these all under a single bucket because I anticipate there to be a lot of repeated content across these.

Limited License

If it's in this section, it's probably only licensed for personal use or the licensing is sufficiently ambiguous that you may not want to train on content in this section. Double check before using any of these.

Listicles

TODO

  • Topology
  • Info Geo
  • Strogatz
  • Allen B Downey
  • Graph Theory
  • Network Theory
  • Ecology
  • Systems Theory
  • Risk Management
  • Emergency Management
  • Climate Science
  • Molecular Biology
  • Vector Calculus
  • Statistical Mechanics
  • Complexity
  • Network Science
  • Social Networks
  • Game Theory
  • Physical Chemistry