A lot of items...
paulkent-um opened this issue · 3 comments
453... There are 453 items in NetHack (well, 453 glyphs with class GLYPH_OBJ). A few of them can be handled the same (T-Shirts and Hawaiian Shirts, for example); a vast majority of them can't. Even with something as similar as leather vs. metal armor, you still need to code which one's better than which in different situations.
Not to mention that different items have different reactions to beatitude; some are just less effective when cursed (like Potions of Healing), some just don't really do anything if cursed (like Potions of Enlightenment); some become hindrances if cursed (like pretty much any equipment); and some have an entirely different effect when cursed that can even be turned to your advantage (like Potions of Gain Level). So in reality the number of different responses we need to implement here is actually even higher than 453, since blessed or cursed objects need to be handled differently, and the way they need to be handled differently is not uniform across items.
This is a lot of work for one person to do as a hobby project, and I'm not sure there's a more efficient way of hard-coding this. I'll obviously do what I can, but I doubt I'll be able to cover everything. Maybe this is where machine learning swoops in to save the day? Maybe we'll actually find more team members, divide and conquer? I can hope, but I can't make promises.
Okay, sure, what are you suggesting for ML to do? Select the best items using RL? I guess, that depends on the state we're in at that time stamp and to select the best item now, it should know what it's going to be doing with that ahead (in the near future). I guess we can do that, but we know. damn very well everything about it. Or we can use model free RL and let it learn from scratch. What do you think?
What if, we tell the agent "this is armor, if you want to use it, wear it", and then let the agent decide if it's a good idea to try to wear it? Hard-code how to use items, let it figure out if and when it's a good idea to do so?
Okay, that seems good. We could make an RL agent. But only for deciding when it is a good idea to use means it has to know about the surrounding, I mean the states. So just making the RL agent do one particular task is difficult since that task depends on so many other factors. However, we can make the RL agent for the whole env and maybe see what it does and when it uses it?