huggingface/deep-rl-class

General feedback on Unit 1

firecoral opened this issue · 1 comments

Hi. I've just completed the unit 1 colab exercise and wanted to provide some feedback. It's an impressive work, but after years of software development I understand the value of fresh eyes, unfamiliar with the product (or in this case, the subject). I wanted to provide my input while this is fresh, before I move on and better understand what I was doing.

Early on, you describe the using the leaderboard to post a result of this exercise. This implied some competition to me and I kept expecting some discussion on how to improve the learning process so I could compete. By the end, it was clear that this was not the major goal of the exercise. Instead, the exercise was meant to acquaint your students with some of the tools involved in RL. This should be made clear early on. The opportunity presented a the end to tweak your settings for an improved score is good, but I expect I'll be learning a lot more about this in upcoming units.

There are a lot of tools and libraries used in this exercise. In some cases there are pointers to dense documentation and tutorials for individual ones. In some cases there is only a single line describing the tool's purpose. In some cases, like Colab, there don't appear to be any. I understand that the information is out there, but the exercise would be much easier to follow if there were short summaries of the critical information about the tool or library and how we are using it. In particular, it would be great to understand the hierarchy and relationships of the tools and libraries early on so that we could refer back to it if we are getting confused.

Colab is an interesting case. It would be nice to have a brief paragraph near the beginning to describe in some detail what colab is, and how we are using the free version of it. I got disconnected numerous times and while I was able to figure out how to get back to the point I was at, a note describing how to do this would likely be valuable. Plus I eventually saw no GPUs available message. A message about how this can be dealt with would be nice.

The dependency and virtual screen installations ran in a straight forward matter, but in addition to the descriptions of the three dependencies, it would have been nice to understand how they fit together. I had no idea at the end where this virtual display was appearing.

The environment/agent/model was confusing probably because I'm not sure that "model" was ever defined. My understanding now is that the trained model is the ultimate results for this exercise. But comments in the code make it appear that the model and agent are the same thing. And the model seems to be created with the SB3 library. (I suspect I'm showing my own confusion here. I'm not looking for answers to my questions but rather trying to explain why I'm confused).

The use of TODO cells followed by Solution cells was confusing. I never had enough information to tackle the initial cell on my own. I always had to use the solution cell, in some cases with a bit of editing. Personally, I would dispense with the initial TODO cells and just provide the Solution cells. Where appropriate I might provide some immediate instructions preceding the cell noting changes for the student to make. This is particularly true for the push instructions where you provide a list of environment variables, but don't really distinguish between defaults and those that specifically need to be pulled from earlier points. (The username component of the repo_id threw me for awhile).

Finally, you suggest ways to climb the leaderboard. But to me, the whole training run was pretty much a black box. It would have been nice to see some notes on what was really going on during the training (at a lower level than the graphic you used on RL). The difference between n_envs, n_steps, batch_size, and timesteps is completely opaque to me.

I imagine these variables will become clear in later units.

Hey there 👋 ,

First of all thank you for your feedback. Although we do not plan to change the course in the upcoming months I wanted to reply to your feedback and add those elements in my feedback list so that if we do a v5.0 we can update some of the elements you mentionned.

  • The opportunity presented a the end to tweak your settings for an improved score is good, but I expect I'll be learning a lot more about this in upcoming units: we have in unit 7 a real challenge "ai vs ai" where you'll compete to get the highest results. The leaderboard is more a learning tool so that people can check the hyperparameters students used.

  • In particular, it would be great to understand the hierarchy and relationships of the tools and libraries early on so that we could refer back to it if we are getting confused. : I understand, though the course is not considered as a complete tool, that's why we provide additional resources for people who wants to get deeper into the different library. Our goal was not to explain the in depth of libs more the theory, since the in-depth are accessible through documentation's librairies.

  • I had no idea at the end where this virtual display was appearing. : we decided to keep the unit 1 short (though it's already dense) and didn't dived into the details of XVFB because our goal is not to understand how it works behind the hood in this course since it's a helper library.

About how the model works this is something you'll learn unit after units. We didn't wanted to overwhelmed the first hands-on but we understand it might be not enough for some of our students that's why we point to additional readings and tutorials since this course is a good starter but not enough.