Removing Invalid Actions

Question

Removing Invalid Actions

Closed this issue 6 years ago · 12 comments

One suggestions that seems reasonable to me is:
https://ai.stackexchange.com/a/2994

This is regarding:
https://github.com/TimZaman/dotaclient/blob/master/policy.py#L127-L133

Answer 1 · 2019-01-26T22:53:26.000Z

Yeah i should add the valid-action-mask itself to the forward function, because i should run the softmax again over the invalid actions.

Answer 2 · 2019-02-01T05:14:15.000Z

I did this now. thanks

Answer 3 · 2019-02-01T14:16:22.000Z

I feel like the bot is learning much faster thanks to this. This is based on purely watching the bot and its decision making.

What do you think?

Answer 4 · 2019-02-01T17:20:23.000Z

Yep very true

…

On Fri, Feb 1, 2019, 06:16 Nostrademous ***@***.*** wrote: I feel like the bot is learning much faster thanks to this. This is based on purely watching the bot and its decision making. What do you think? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHXSRD1ZI2sGJidkKtFNLNMiNf-_7ZyKks5vJEw2gaJpZM4aBi98> .

Answer 5 · 2019-02-01T17:38:02.000Z

I thought you were going to increase the reward values for Win/Loss? I still see kills/deaths as more impactful in current version.

Answer 6 · 2019-02-01T17:51:45.000Z

Doesnt matter at the moment, it doesnt explore or train well. It gets to be a last hit master but there is no smart play going on.

…

On Fri, Feb 1, 2019, 09:38 Nostrademous ***@***.*** wrote: I thought you were going to increase the reward values for Win/Loss? I still see kills/deaths as more impactful in current version. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHXSRIzljQ8NGxXswIYhQ5N-tkFu48Nkks5vJHt6gaJpZM4aBi98> .

Answer 7 · 2019-02-01T18:19:51.000Z

So I think I'm going to work on fixing that by doing several things:

Teach it about lane fronts and reward it by being near it (rather than at location (0,0,0) <--- center of map
Change the Tower HP reward to be a delta between the Enemy Tower HP and Friendly Tower HP (as in "it's okay if our tower is taking damage providing we are doing more damage to enemy tower")

It should help.

Separate from that, I'm working on formalizing a new ML approach to how to teach agents to do strategic planning versus tactical actions. I hopefully can share with folks outside my company.

Answer 8 · 2019-02-01T18:24:58.000Z

Actually the distance reward can be removed. I just sent you an email, ill be at OpenAI tomorrow. Any questions we have for them?

…

On Fri, Feb 1, 2019, 10:19 Nostrademous ***@***.*** wrote: So I think I'm going to work on fixing that by doing several things: 1. Teach it about lane fronts and reward it by being near it (rather than at location (0,0,0) <--- center of map 2. Change the Tower HP reward to be a delta between the Enemy Tower HP and Friendly Tower HP (as in "it's okay if our tower is taking damage providing we are doing more damage to enemy tower") It should help. Separate from that, I'm working on formalizing a new ML approach to how to teach agents to do strategic planning versus tactical actions. I hopefully can share with folks outside my company. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHXSRJ_-4cUG2y_Osbxiisgphhs6Z-l-ks5vJIVHgaJpZM4aBi98> .

Answer 9 · 2019-02-01T18:58:21.000Z

Might be easier to chat briefly. Jump into a Google Hangouts if you can.

I'll be sitting there for the next 20-30min.

LINK DOWN

Answer 10 · 2019-02-01T20:38:06.000Z

It was sad and lonely.... replied to you email.

Answer 11 · 2019-02-04T15:47:34.000Z

So looking at code - you don't disable all "invalid" actions.

The code only prevents you attacking yourself, or issuing an attack action all together if there are no valid unit_handles, it does not prevent attacking your own units if they are at full health (which has no effect, although technically valid).

I'm not saying this is a problem that needs fixing, just pointing it out. Attacking your own units at full health is a way to drop tower aggro for example.

Answer 12 · 2019-02-04T17:34:56.000Z

Sure, i knew that. In principle its just good to prevent. I do need to check what is going on when the hero is dead.

…

On Mon, Feb 4, 2019, 07:47 Nostrademous ***@***.*** wrote: So looking at code - you don't disable all "invalid" actions. The code only prevents you attacking yourself, or issuing an attack action all together if there are no valid unit_handles, it does not prevent attacking your own units if they are at full health (which has no effect, although technically valid). I'm not saying this is a problem that needs fixing, just pointing it out. Attacking your own units at full health is a way to drop tower aggro for example. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHXSRKsUNp1ddnkpnQ2JlM2NWj-Vgv83ks5vKFYWgaJpZM4aBi98> .