small update to sec 35.5.3 on deadly triad

Question

murphyk opened this issue 5 months ago · 0 comments

Added brief discussion of gradient TD and target networks to stabilize off-policy learning.