[Re] Faster Teaching via POMDP Planning (partial replication)
luksurious opened this issue Β· 29 comments
Original article: Faster Teaching via POMDP Planning by Rafferty et. al (2016) , https://www.onlinelibrary.wiley.com/doi/full/10.1111/cogs.12290
PDF URL: https://github.com/luksurious/faster-teaching/blob/master/replication-paper.pdf
Metadata URL: https://github.com/luksurious/faster-teaching/blob/master/metadata.yaml
Code URL: https://github.com/luksurious/faster-teaching
Scientific domain: Cognitive Science
Programming language: Python
Suggested editor: -
(Sorry, I had first put the wrong link to the paper, it is now working well)
Thanks for your submission. We'll assign an editor soon !
@gdetor @koustuvsinha Can you edit this (regular) submission (Faster Teaching via POMDP Planning)?
@koustuvsinha Can you edit this (regular) submission (Faster Teaching via POMDP Planning)?
@koustuvsinha Gentle reminder
Hi @rougier, sorry my notifications are mixed up to a wrong email address and hence I missed this. I'll review it this week!
@koustuvsinha No problem. Actually the request is for editing, meaning you just need to assign 2 reviewers (from the board or from elsewhere). Or you can edit and review and find another reviewer.
@rougier got it! I'll ask for reviewers!
Hi @benureau π would you be interested to review this article?
Hi @xuedong π would you also be interested to review this article?
Hi @benureau π would you be interested to review this article?
Dear @koustuvsinha, I have personally worked with AurΓ©lien Nioche, the second author, in the same lab, during my last position. I don't think I can review this paper.
Thanks for notifying about the conflict @benureau :)
Hi @bengioe, as we discussed in email, would you be interested to review this article? You can find the reviewer guidelines here. Many thanks!
Yes, I can review this paper.
Thanks so much @bengioe! Additionally, it would be great if you can comment in this thread with your credentials, so that we can onboard you as a reviewer. @rougier, can you confirm that is the process to onboard external reviewers?
Hi @koustuvsinha, I think POMDP is somehow out of my expertise and may not be able to offer a proper review, I'm sorry.
Hello, here is my review. As this is my first review here, I'm happy to extend it if more details are necessary.
Original Paper The original paper being reproduced is a 2016 paper using the Partially Observable Markov Decision Process framework in conjunction with heuristic student models to learn teaching policies. The original paper simulates students through their heuristic models, and also performs human trials to validate their approach.
Specifically, 3 student models are proposed, a memoryless model, a short-term memory model, and a continuous belief model (estimated with particle filters). In such a model, a weighing of the enumeration of all possible concepts consists of the state space, while the action space consists of the 3 possible actions for the teacher (show an example, quiz, quiz with feedback).
The original paper finds that using appropriate student models to estimate the optimal policy can improve teaching.
Reproduction This study reproduces the simulated experiments of the original paper. While most findings are not significantly different, the results that are seem more consistent with expectations about the algorithms that are used. In addition, this study highlights some errors of the original paper and explicitly provides many details useful to reproduction.
- audience - I estimate that an audience with a generic CS background should be able to understand this study. The details of the POMDP framework as well as those of the individual learned models are explained clearly.
- level of detail - all details needed to reproduce the simulations and reimplement the proposed algorithms are present. I appreciate in particular Table 5, which for me clarified the scope of the Number Game of the original paper.
- discussion - the study contains an interesting discussion of the reproduction of the results, the differences found, as well as the choices made to create this experimental setup (such as the relative cost of actions). In particular it weakens some of the conclusions of the original paper, leaving place for improvement and additional research on this setting.
- writing - The writing was clear throughout, I have made a few suggestions below.
Code comments:
- I was able to run the code successfully. The code is fairly clean and self-explanatory.
- I would suggest adding a
mkdir -p data
command in the.sh
files to avoid failure from a fresh clone of the repository. - Some code seems to use
random
instead ofnumpy.random
. While I see that you seed both modules, it would be preferable to only use one. (for example, prior to numpy1.17 bothrandom
andnumpy.random
use the same PRNG algorithm, MT19937, so seeding both with the same seed should produce the exact same sequence of numbers, which isn't desirable).
Additional comments:
- Section 2.1, you introduce Q(s,a) in (2) and only define it later with (3). It may worth very shortly introducing Q right after (2), or even to more generally explain what action-value functions are.
- Sec 3.2, 3.3, The noise/cost parameters (Table 1-4) appear quite specific, it may be worth mentioning the noises come from Corbett & Anderson, 1995 (I think? This is what I understood from the original paper, which you cite as their source), and that the cost come from control human experiments and represent seconds.
- Sec 3.6, "Similar to the particle filter in the continuous model where, the belief [are reset]", did you mean "Similarly, for the particle filter in the continuous model, the beliefs [are reset]"
- You end the paper with this: "Through this replication, we hope to facilitate research in this direction." I think it would be nice to argue why this is valuable; e.g. is it easy to write down analytical learned models? Do modern computational capacities allow to use the full potential of POMDPs?
Note on style:
- in the original paper, both students and teachers are referred to as "she" and "her", whereas in this paper there is a mix of "he/his" and "their". Note that in English using "they" is a valid gender-neutral singular pronoun. If you wish to, I'd suggest either using "she/her" in honor of the original paper or "they/their" for consistency.
Many thanks, @bengioe for the comprehensive review! I'm still looking for an additional reviewer, hopefully can assign this soon. @luksurious can you go through the review and address the comments?
@amyzhang has accepted to review this paper! (as discussed in email). Thanks a lot! :) You can find the reviewer guidelines here.
@bengioe Thank you for your review.
We have addressed your comments.
- The code uses only
np.random
now (with the upgraded NG system in 1.17+), and the setup is self-contained (also removed some test output I left there) - The paper is updated to address your points.
- The Q function is correctly introduced, belief reset is better explained, and the conclusion is extended.
- Re Sec 3.2, 3.3, noise parameters: They were also fitted from the control experiments with humans in the original study. This is explained in detail in the supplementary material of the original paper (for reference: Supplementary material). We added a note to clarify this.
- We employed now
they
as a gender-neutral pronoun
Let me know if there are additional points to address.
Replication. The original paper uses POMDPs to formulate a teacher-student setting. The selection of the next teaching activity is a planning problem, where the teacher maintains a belief state of the student. There are three models proposed, and evaluated on two tasks.
The three models consist of a memoryless model, a discrete model with memory, explicitly keeping a history of the past m actions, and a continuous model with implicit information about the entire history.
The two tasks are a simple letter arithmetic task with the goal of finding the correct mapping between a set of letters and numbers, and a number game where the students learn a target number concept for general numbers, such as odd numbers or numbers within some range.
Reproducibility of the replication. The authors were able to reproduce the results for the first task. However, in the second task their results differ. Specifically, the authors found that they could train policies with the three methods that perform better than random, but not necessarily better than baselines. Further, there were failure modes for certain policies paired with certain learner models.
Clarity of code and instructions. The instructions were clear, and there were separate scripts to run for each task, which made running the results very simple. I was able to set up my environment and run the scripts on the first try using the README in the code.
Clarity and completeness of the accompanying article. The article is clear and well written, with a general description of the contributions and evaluations in the original paper, and the findings in the beginning. I have a high-level suggestion to make clear what components and details in the methods section are taken from the original paper, and what was unclear from the original paper and therefore required design decisions on the part of the authors. When reading the paper, it is unclear if any liberties needed to be taken from the original method.
I appreciated the analysis in the Experiments section for Task 2, where the authors lay out the differences in version of the task and their design choice. The discussion sections with analysis on the potential failure modes in the method and experiments was also very useful and answered some of my earlier questions as to what was the original method and what required changes on the authors' part in order to replicate the results. Some components should go in the methods section to highlight the contributions and possible deviations from the original method rather than getting buried at the end. Several of the conjectures for improvement in the discussion section are also very interesting, and it would be nice to see if they are backed up empirically -- but perhaps that is outside the scope of this work.
Other than the high-level reshuffling of some paragraphs in the Discussion section into the Methods section and added clarity into what was explicitly described in the original paper and what was not, I don't have any suggestions for improvements. I found the explanation of the method and tasks clear, and the results, analysis, and discussion insightful. Great job!
@amyzhang Thank you for your review.
We verified all questions and unclear elements with the original authors.
I have added a subsection at the end of Methods to make this more explicit.
Except for the remark about the belief update, which is mentioned now at the end of the methods section, I did not find any other remarks in the discussion that refer to changes compared to the original description.
Please let me know if there are particular segments remaining that should be addressed earlier in the methods section.
Thanks!
Hi @luksurious, thanks for your remarks, and @amyzhang and @bengioe thanks for your valuable reviews. I believe the submission has acknowledged the proposed changes. Unless the reviewers feel strongly about the rebuttal, I vote for acceptance of the paper.
I also vote for acceptance.
I vote for acceptance as well.
Great!! Congrats @luksurious, your paper is now accepted to ReScience journal! π I'll follow up soon with a PR to your repository with the correct article numbers.
@luksurious I couldn't find the ReScience template tex sources in your repository. Can you add them such that I can compile on my end?
@koustuvsinha Ah sorry, I was working with it in Overleaf, so I synced it now to a new repository. You can find it here
https://github.com/luksurious/faster-teaching-paper
Article published on Zenodo and will be available shortly on ReScience website!
- Article PR: ReScience/articles#20
- Website PR: ReScience/rescience.github.io#96
This concludes the reviewing and editing process of this paper.