JohnAllen opened this issue 2 years ago · 0 comments
Can someone help me understand where the training actually happens? Where do the rewards feed back back into a network or something that makes a better action in the future more likely?