microsoft/TextWorld

Random and Neural Agent low results

gari-marcos opened this issue · 3 comments

Hi, when executing the Building a simple agent notebook on my local machine or on Google Colab, i get very different results. I do not understand why if I did not make any changes to the code. For instance, when playing with the Random Agent I get the following results:

rewardsDense_goalDetailed.ulx..........  	avg. steps:  50.0; avg. score:  2.6 / 11.
rewardsBalanced_goalDetailed.ulx..........  	avg. steps:  50.0; avg. score:  0.1 / 5.
rewardsSparse_goalDetailed.ulx..........  	avg. steps:  50.0; avg. score:  0.0 / 1.

However, the GitHub results are the following:

rewardsDense_goalDetailed.ulx..........  	avg. steps: 100.0; avg. score:  4.2 / 11.
rewardsBalanced_goalDetailed.ulx..........  	avg. steps: 100.0; avg. score:  0.7 / 5.
rewardsSparse_goalDetailed.ulx..........  	avg. steps: 100.0; avg. score:  0.0 / 1.

As you can see, my code does less steps(half of them) and the scores are quite much worse. The same thing happens with the Neural Agent, which, for example, when training it takes quite less than the one that is on GitHub results.

Any idea of why could this be happening?

Thanks for catching that. It appears that the notebook was changed a bit without being re-run before uploading it. Do you mind trying to change max_steps=50 to max_steps=100 in the play function and re-run it?

Thank you for your fast reply. As you said, I changed the max_step parameter to 100 and the results improved. However, in the 47th cell:

# We report the score and steps averaged over 10 playthroughs.
play(RandomAgent(), "./games/another_game.ulx")
play(agent, "./games/another_game.ulx")

another_game.ulx..........  	avg. steps:  50.0; avg. score:  2.8 / 9.
another_game.ulx..........  	avg. steps:  47.3; avg. score:  6.7 / 9.

the avg. steps are around 50, so it seems the max_step is set to 50 again. Moreover, the last results have around 50 steps too:

agent.test()
play(agent, "./games/rewardsDense_goalDetailed.ulx")  # Averaged over 10 playthroughs.
play(agent, "./testing_games/", nb_episodes=20 * 10)  # Averaged over 10 playthroughs for each test game.
play(RandomAgent(), "./testing_games/", nb_episodes=20 * 10)

rewardsDense_goalDetailed.ulx..........  	avg. steps:  47.4; avg. score:  8.3 / 11.
./testing_games.................................  	avg. steps:  49.2; avg. score:  0.7 / 1.
./testing_games.................................  	avg. steps:  50.0; avg. score:  0.3 / 1.

Thanks for taking this into account.

@gari-marcos the above PR should have the proper outputs now.