peer review for the lab3
Opened this issue · 0 comments
Hi,
Overall, I think that the code is quite clear and the .md file is precise and in-depth, therefore useful for better interpreting the algorithms.
About the first task, I have nothing in particular to say, a part that the hard-coded rules based strategy is ok.
For the second task, first of all, congratulations for implementing three different strategies and for, at least in the firt two, considering also the k parameter. I appreciate the definition of both cross-over and mutation. In the strategy 0, you divide the game in two phase and the parameters to optimize are used only in one ir the other of the two phases (only one at the beginning, and two in the final phase): it could be reasonable, given that, if you do not use nim-sum based strategies, the salient decision are towards the end of the match. The strategy 1, in my opinion, is very fanciful because of the use of the parameters alpha and beta to evaluate the non-weighted distribution of the objects over the rows (unusual idea, I think); the results you achieve are quite good, but I do not know if the influence of the parameters alpha and beta is so primary and fundamental or the results are maily affected from the fact that the internal hard coded rules (similar for the two branch of the if(cond):-else:) are well written. Finally, the strategy 2 is perhaps the one I like the most, because, even if it uses the xor operation (so the operator that could lead to the optimal nim-sum strategy; and this could be questionable, because it is like giving clues to the GA), it is satisfying to observe that the genetic algorithm learns to use only the optimal operator after a few generations, discarding the operators and and or.
About the third task, I can ascertain that is a good mix-max (also the one with the alpha-beta pruning): as you correctly observed, this type of pruning does not change the results, but it only slightly speeds up execution. In addiction, you could apply also the "hard cut-off", i.e. the limitation of the depth of the search at a certain point, at the expense of not reaching the optimal solution.
In the end, for the fourth task, I want to highlight the use of code that is not directly inspired by the one proposed by the Assistant Professor (learning and interpreting different approaches, in my opinion, is a plus); I think that, in general, your structure (the files folder RL_libs) is more complex, and therefore more difficult to analyze. Anyway, I agree with the choice of the rewards: +1, if it wins; -1, if it loses; 0, in all the other cases. Moreover, the results are good!