lifrordi/DeepStack-Leduc

Idea to improve training

Opened this issue · 0 comments

Right now DeepStack is using masked huber loss to compute the loss where the bucket is given weight 0 if impossible and 1 if possible.
What if we changed the mask so it can be any value between 0 and 1 weighted by how likely that bucket is?

So if there are 2 buckets A and B that both have error of 0.5, but bucket A has range probability 0.01, and bucket B has probability 0.0001, it would give 100x more importance to updating bucket A's CFV to become closer to its target.