probml/pml2-book

small update to sec 35.5.3 on deadly triad

murphyk opened this issue · 0 comments

Added brief discussion of gradient TD and target networks to stabilize off-policy learning.