/deep_learning_curriculum

Language model alignment-focused deep learning curriculum

Deep Learning Curriculum

This is an advanced curriculum for getting up to speed with some of the latest developments in deep learning, as of July 2022. It is heavily biased towards my own research interests, especially large language model alignment, but it should be of general interest. It is targeted at people with a strong quantitative background who are familiar with the basics of deep learning, but may otherwise be new to the field.

WARNING: this curriculum may be extremely challenging to take on alone. It is highly recommended to find a more experienced mentor, or at the very least a study partner. There are suggestions for more accessible alternatives, which have the same prerequisites, below.

I do not intend to try to keep this curriculum up-to-date, but PRs are welcome, although I reserve the right to be picky about what gets included to avoid it becoming too bloated.

Credit to John Schulman for an older curriculum which inspired this and from which I cribbed bits and pieces.

Pre-prerequisites

Before studying deep learning, I recommend being comfortable with the very basics of the following topics:

  • Linear algebra: It's essential to understand how vectors and matrices work, and helpful to understand eigenvalues and eigenvectors.
  • Probability: It's essential to understand the rules of probability, expected value and standard deviation, and helpful to understand independence and the normal distribution.
  • Calculus: It's essential to understand differentiation and partial differentiation, and helpful to understand the basics of vector calculus including the chain rule and Taylor series.
  • Programming: I recommend getting to know Python and numpy.
  • Optional extras:
    • Statistics: It's helpful to understand estimators and standard errors.
    • Information theory: It's helpful to understand information, entropy and KL divergence.

You do not by any means need to understand the whole of these subjects in great depth in order to approach deep learning, but being very familiar with the basic ideas will make your life easier. The benefit of studying these subjects in depth is that you will gain this familiarity by thinking a lot about the relevant ideas.

There are many resources for studying these topics. Here are some suggestions:

I recommend putting a significant fraction of your study time into exercises – problems for math, and implementation for programming – since it forces you to think through the ideas for yourself.

Prerequisites

Before embarking on this curriculum (or one of the alternatives suggested below), it is necessary to understand the basics of deep learning, including basic machine learning terminology, what neural networks are, and how to train them. Here are some suggested resources for this:

In addition to this, I recommend learning PyTorch, and, as an exercise, using it to train a small neural network to do MNIST classification. You should be able to achieve around 99% test accuracy using a 3-layer convolutional neural network. You can also use your setup to run a few experiments on some simple methods of regularization.

How to use this curriculum

The curriculum is divided into 9 chapters:

The chapters are not necessarily of equal importance. For a typical person, the order of importance is something like: 1, 6, 2, 7, 4, 8, 9, 3, 5.

Chapter 1 is helpful for understanding chapters 2, 3, 8 and 9, and chapter 6 may also be somewhat helpful for understanding chapter 7, but otherwise the chapters can be completed in any order.

For a longer version of Chapter 7, replace it by the AGI Safety Fundamentals technical AI alignment track. For a longer version of Chapter 8, replace it by Concrete Steps to Get Started in Transformer Mechanistic Interpretability.

Each chapter has three sections:

  • Recommended reading: A small amount of material covering the most basic or important ideas of the chapter. Often a particular section of a paper will be singled out, and it's not necessary to read the rest of the paper.
  • Optional reading: Related material that's still very relevant to the chapter, but can be either skimmed or skipped entirely and used as a reference later on.
  • Suggested exercise: An idea for an exercise, generally implementation-focused, to help drive some of the ideas of the chapter home. It is generally more important to do some sort of exercise than to follow the exact exercise suggested.

Each chapter should take between half a week and 2 weeks of full-time study, depending on the chapter and how much depth you go into, but don't be discouraged if it takes longer than this.

Once you have completed an exercise, you can take a look at other people's solutions here. You are welcome to add a link to your own solutions to that page by submitting a PR.

The main other useful skill for working in this area is software engineering, especially working with distributed systems and large, shared codebases. I think the most common way to improve at this is through professional experience, but it can also be done by contributing to open source projects. In order to pick up best practices, you generally want to be reading other people's code and having others review your code.

Alternatives to this curriculum

This curriculum may be very challenging, especially without mentorship. A more realistic alternative for most people is to work on larger programming projects involving deep learning, and/or to work through more advanced textbooks or online courses. These can be engaging, provide structure, and last a long time. The downside is that they may be less focused on the most relevant material. But it's always possible to return to this curriculum at a later date.

An example of a larger programming project I worked on was training a neural network to play backgammon using TD-learning, but you should choose something you're motivated by. It's also a good idea to create a short write-up of the project and any experiments involved.

Some textbook suggestions:

  • Goodfellow et al - Probably still the best deep learning-specific textbook, although it can be unclear in places. Has no exercises.
  • Jared Kaplan's notes for physicists - An extended introduction to deep learning from a fairly theoretical perspective. Also has no exercises.
  • Sutton & Barto - A good introduction to reinforcement learning from first principles. Has exercises.
  • Murphy, Bishop and ESL - Classic machine learning textbooks. These cover a lot of material that isn't that relevant to deep learning, but it can be nice to have a broader perspective, and they have plenty of exercises. Overall they wouldn't be my first choice but they can be useful.

Some suggestions for more advanced online courses:

Additional advice

These are miscellaneous opinions based on personal experience, so take them with a pinch of salt. An Opinionated Guide to ML Research is also worth a read. For advice on pursuing a career in technical AI alignment, I recommend this guide.

I've emphasized exercises throughout this page, because they force you to think ideas through in a way that passive learning doesn't. In my experience, this is especially important when learning about unfamiliar topics. Once you have built up experience in an area, it's much easier to remember something in that area that you've only seen or heard once, because of how it fits into your existing web of knowledge.

To be productive, physical and mental health are paramount. Beyond that, there's no good substitute for intrinsic motivation. Productivity tricks can be useful for getting laborious work done when necessary, but can only do so much. Intrinsic motivation can come both from higher-level excitement about a project, as well as from lower-level flow states (which can be common with programming). When it doesn't matter much what you're working on because you're mostly learning, choose things you'll be intrinsically motivated by, as that's the easiest way to excel. But exciting areas aren't always important, and so in the long run, you'll need to cultivate excitement about important areas by learning more about them.

Vertical integration can be powerful, because abstractions are generally leaky. The most effective researchers have an understanding of their entire research stack, from the ins and outs of different benchmarks through to the details of GPU caches. Especially early on in your career, learning as much as you can about everything that's relevant is probably a better strategy than focusing only on what you need to get by.

As an important special case of the above principle, if you want to have a positive impact on the world with your research, I highly recommend focusing on how best to achieve this as part of your studies. Choosing the right problems to work on can be just as important as the quality of your execution, even if you end up spending most of your time on the latter. Moreover, it's not always enough to defer to others on such questions, since your higher-level motivations will influence your lower-level decisions, not to mention the fact that other people can be wrong or hard to understand.

I personally expect AI to have a far greater impact on the world in the coming decades than it is having now, and that this is enough to outweigh the urgency and tractability of present-day problems (though I also see a lot in common between both sets of problems). For more discussion of this perspective, I recommend Cold Takes's "most important century" series. There also used to be the Alignment Newsletter, which was good for staying up-to-date with related research, but it hasn't been active in a while.