murphyk opened this issue 5 months ago · 0 comments
Added brief discussion of gradient TD and target networks to stabilize off-policy learning.