Huan2018/phd-bibliography

References on Optimal Control, Reinforcement Learning and Motion Planning

Bibliography

Table of contents

Optimal Control
Safe Control
Sequential Learning
- Multi-Armed Bandit
- Reinforcement Learning
Learning from Demonstrations
- Imitation Learning
  - Applications to Autonomous Driving
- Inverse Reinforcement Learning
  - Applications to Autonomous Driving
Motion Planning

Optimal Control

Dynamic Programming

(book) Dynamic Programming, Bellman R., 1957.
(book) Dynamic Programming and Optimal Control, Volumes 1 and 2, Bertsekas D., 1995.
(book) Markov Decision Processes - Discrete Stochastic Dynamic Programming, Puterman M., 1995.

Approximate Planning

ExpectiMinimax Optimal strategy in games with chance nodes, Melkó E., Nagy B., 2007.
Sparse sampling A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Kearns M. et al, 2002.
MCTS Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Rémi Coulom, SequeL, 2006.
UCT Bandit based Monte-Carlo Planning, Kocsis L., Szepesvári C., 2006.
AlphaGo Mastering the game of Go with deep neural networks and tree search, Silver D. et al, 2016.
AlphaGo Zero Mastering the game of Go without human knowledge, Silver D. et al, 2017.
AlphaZero Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver D. et al, 2017.
TrailBlazer Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R., 2017.
MCTSnets Learning to search with MCTSnets, Guez A. et al, 2018.
ADI Solving the Rubik's Cube Without Human Knowledge, McAleer S. et al, 2018.

Control Theory

(book) Constrained Control and Estimation, Goodwin G., 2005.
PI² A Generalized Path Integral Control Approach to Reinforcement Learning, Theodorou E. et al, 2010.
PI²-CMA Path Integral Policy Improvement with Covariance Matrix Adaptation, Stulp F., Sigaud O., 2010.
iLQG A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems, Todorov E., 2005.
iLQG+ Synthesis and stabilization of complex behaviors through online trajectory optimization, Tassa Y., 2012.

Model Predictive Control

(book) Model Predictive Control, Camacho E., 1995.
(book) Predictive Control With Constraints, Maciejowski J. M., 2002.
Linear Model Predictive Control for Lane Keeping and Obstacle Avoidance on Low Curvature Roads, Turri V. et al, 2013.
MPCC Optimization-based autonomous racing of 1:43 scale RC cars, Liniger A. et al, 2014. (video 1 | 2)
MIQP Optimal trajectory planning for autonomous driving integrating logical constraints: An MIQP perspective, Qian X., Altché F., Bender P., Stiller C. de La Fortelle A., 2016.

Safe Control

Robust Control

Minimax analysis of stochastic problems, Shapiro A., Kleywegt A., 2002.
Robust DP Robust Dynamic Programming, Iyengar G., 2005.
Robust Planning and Optimization, Laumanns M., 2011. (lecture notes)
Robust Markov Decision Processes, Wiesemann W., Kuhn D., Rustem B., 2012.
Coarse-Id On the Sample Complexity of the Linear Quadratic Regulator, Dean S., Mania H., Matni N., Recht B., Tu S., 2017.
Tube-MPPI Robust Sampling Based Model Predictive Control with Sparse Objective Information, Williams G. et al, 2018. (video)

Risk-Averse Control

A Comprehensive Survey on Safe Reinforcement Learning, García J., Fernández F., 2015.

Constrained Control

ICS Will the Driver Seat Ever Be Empty?, Fraichard T., 2014.
RSS On a Formal Model of Safe and Scalable Self-driving Cars, Shalev-Shwartz S. et al, 2017.
BFTQ Safe Transfer across Reinforcement Learning Tasks, Carrara N. et al, 2018.

Uncertain Dynamical Systems

Simulation of Controlled Uncertain Nonlinear Systems, Tibken B. Hofer E., 1995.
Trajectory computation of dynamic uncertain systems, Adrot O. Flaus J-M., 2002.
Simulation of Uncertain Dynamic Systems Described By Interval Models: a Survey, Puig V. et al, 2005.
Design of interval observers for uncertain dynamical systems, Efimov D., Raïssi T., 2016.

Sequential Learning

Multi-Armed Bandit

LUCB PAC Subset Selection in Stochastic Multi-armed Bandits, Kalyanakrishnan S. et al, 2012.
Track-and-Stop Optimal Best Arm Identification with Fixed Confidence, Garivier A., Kaufmann E., 2016.
M-LUCB/M-Racing Maximin Action Identification: A New Bandit Framework for Games, Garivier A., Kaufmann E., Koolen W., 2016.
LUCB-micro Structured Best Arm Identification with Fixed Confidence, Huang R. et al, 2017.

Reinforcement Learning

Reinforcement learning: A survey, Kaelbling L. et al, 1996.

Value-based

DQN Playing Atari with Deep Reinforcement Learning, Mnih V. et al, 2013. (video)
DDQN Deep Reinforcement Learning with Double Q-learning, van Hasselt H. Silver D. et al, 2015.
DDDQN Dueling Network Architectures for Deep Reinforcement Learning, Wang Z. et al, 2015. (video)
PDDDQN Prioritized Experience Replay, Schaul T. et al, 2015.
NAF Continuous Deep Q-Learning with Model-based Acceleration, Gu S. et al, 2016.
Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel M. et al, 2017.
Ape-X DQfD Observe and Look Further: Achieving Consistent Performance on Atari, Pohlen T. et al, 2018. (videos)

Policy-based

Policy gradient

REINFORCE Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams R., 1992.
Natural Gradient A Natural Policy Gradient, Kakade S., 2002.
Policy Gradient Methods for Robotics, Peters J., Schaal S., 2006.
TRPO Trust Region Policy Optimization, Schulman J. et al, 2015. (video)
PPO Proximal Policy Optimization Algorithms, Schulman J. et al, 2017. (video)
DPPO Emergence of Locomotion Behaviours in Rich Environments, Heess N. et al, 2017. (video)

Actor-critic

AC Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton R. et al, 1999.
NAC Natural Actor-Critic, Peters J. et al, 2005.
DPG Deterministic Policy Gradient Algorithms, Silver D. et al, 2014.
DDPG Continuous Control With Deep Reinforcement Learning, Lillicrap T. et al, 2015. (video 1 | 2 | 3 | 4)
A3C Asynchronous Methods for Deep Reinforcement Learning, Mnih V. et al 2016. (video 1 | 2 | 3)
SAC Soft Actor-Critic : Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja T. et al, 2018. (video)

Derivative-free

CEM Learning Tetris Using the Noisy Cross-Entropy Method, Szita I. Lörincz A., 2006. (video)
CMAES Completely Derandomized Self-Adaptation in Evolution Strategies, Hansen N. Ostermeier A., 2001.
NEAT Evolving Neural Networks through Augmenting Topologies, Stanley K., 2002. (video)

Model-based

Dyna Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming, Sutton R., 1990.
UCRL2 Near-optimal Regret Bounds for Reinforcement Learning, Jaksch T., ‎2010.
PILCO PILCO: A Model-Based and Data-Efficient Approach to Policy Search, Deisenroth M., Rasmussen C., 2011. (talk)
DBN Probabilistic MDP-behavior planning for cars, Brechtel S. et al, 2011.
GPS End-to-End Training of Deep Visuomotor Policies, Levine S. et al, 2015. (video)
DeepMPC DeepMPC: Learning Deep Latent Features for Model Predictive Control, Lenz I. et al, 2015. (video)
SVG Learning Continuous Control Policies by Stochastic Value Gradients, Heess N. et al, 2015. (video)
Optimal control with learned local models: Application to dexterous manipulation, Kumar V. et al, 2016. (video)
BPTT Long-term Planning by Short-term Prediction, Shalev-Shwartz S. et al, 2016. (video 1 | 2)
Deep visual foresight for planning robot motion, Finn C., Levine S., 2016. (video)
VIN Value Iteration Networks, Tamar A. et al , 2016. (video)
VPN Value Prediction Network, Oh J. et al, 2017.
An LSTM Network for Highway Trajectory Prediction, Altché F., de La Fortelle A., 2017.
DistGBP Model-Based Planning with Discrete and Continuous Actions, Henaff M. et al, 2017. (video 1 | 2)
Prediction and Control with Temporal Segment Models, Mishra N. et al, 2017.
Predictron The Predictron: End-To-End Learning and Planning, Silver D. et al, 2017. (video)
MPPI Information Theoretic MPC for Model-Based Reinforcement Learning, Williams G. et al, 2017. (video)
Learning Real-World Robot Policies by Dreaming, Piergiovanni A. et al, 2018.

Temporal Abstraction

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Sutton R. et al, 1999.
Intrinsically motivated learning of hierarchical collections of skills, Barto A. et al, 2004.
Learning and Transfer of Modulated Locomotor Controllers, Heess N. et al, 2016. (video)
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, Shalev-Shwartz S. et al, 2016.
FuNs FeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets A. et al, 2017.
Combining Neural Networks and Tree Search for Task and Motion Planning in Challenging Environments, Paxton C. et al, 2017. (video)

Partial Observability

PBVI Point-based Value Iteration: An anytime algorithm for POMDPs, Pineau J. et al, 2003.
cPBVI Point-Based Value Iteration for Continuous POMDPs, Porta J. et al, 2006.
POMCP Monte-Carlo Planning in Large POMDPs, Silver D., Veness J., 2010.
A POMDP Approach to Robot Motion Planning under Uncertainty, Du Y. et al, 2010.
Solving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation, Brechtel S. et al, 2013.
Probabilistic Decision-Making under Uncertainty for Autonomous Driving using Continuous POMDPs, Brechtel S. et al, 2014.
MOMDP Intention-Aware Motion Planning, Bandyopadhyay T. et al, 2013.
The value of inferring the internal state of traffic participants for autonomous freeway driving, Sunberg Z. et al, 2017.

Transfer

Virtual to Real Reinforcement Learning for Autonomous Driving, Pan X. et al, 2017. (video)
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots, Tan J. et al, 2018. (video)
ME-TRPO Model-Ensemble Trust-Region Policy Optimization, Kurutach T. et al, 2018. (video)
Kickstarting Deep Reinforcement Learning, Schmitt S. et al, 2018.
Learning Dexterous In-Hand Manipulation, OpenAI, 2018. (video)

Multi-agent

Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems, Albrecht S. Stone P., 2017.
MILP Time-optimal coordination of mobile robots along specified paths, Altché F. et al, 2016. (video)
MIQP An Algorithm for Supervised Driving of Cooperative Semi-Autonomous Vehicles, Altché F. et al, 2017. (video)
SA-CADRL Socially Aware Motion Planning with Deep Reinforcement Learning, Chen Y. et al, 2017. (video)
Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment, Galceran E. et al, 2017.
Online decision-making for scalable autonomous systems, Wray K. et al, 2017.
MAgent MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence, Zheng L. et al, 2017. (video)
Cooperative Motion Planning for Non-Holonomic Agents with Value Iteration Networks, Rehder E. et al, 2017.
COMA Counterfactual Multi-Agent Policy Gradients, Foerster J. et al, 2017.
FTW Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, Jaderberg M. et al, 2018. (video)

Representation

DeepDriving DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving, Chen C. et al, 2015. (video)
On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training, Shalev-Shwartz S. et al, 2016.
VAE-MDN-RNN World Models, Ha D., Schmidhuber J., 2018.
MERLIN Unsupervised Predictive Memory in a Goal-Directed Agent, Wayne G. et al, 2018. (video 1 | 2 | 3 | 4 | 5 6)

Other

Is the Bellman residual a bad proxy?, Geist M., Piot B., Pietquin O., 2016.
Deep Reinforcement Learning that Matters, Henderson P. et al, 2017.
Automatic Bridge Bidding Using Deep Reinforcement Learning, Yeh C. and Lin H., 2016.
Shared Autonomy via Deep Reinforcement Learning, Reddy S. et al, 2018. (videos)
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review, Levine S., 2018.

Learning from Demonstrations

Imitation Learning

DQfD Learning from Demonstrations for Real World Reinforcement Learning, Hester T. et al, 2017. (videos)
UPN Universal Planning Networks, Srinivas A. et al, 2018. (videos)
QMDP-RCNN Reinforcement Learning via Recurrent Convolutional Neural Networks, Shankar T. et al, 2016. (talk)
GAIL Generative Adversarial Imitation Learning, Ho J., Ermon S., 2016.
From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots, Pfeiffer M. et al, 2017. (video)
Branched End-to-end Driving via Conditional Imitation Learning, Codevilla F. et al, 2017. (video | talk)
DeepMimic DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills, Peng X. B. et al, 2018. (video)

Applications to Autonomous Driving

ALVINN, an autonomous land vehicle in a neural network, Pomerleau D., 1989.
End to End Learning for Self-Driving Cars, Bojarski M. et al, 2016. (video)
End-to-end Learning of Driving Models from Large-scale Video Datasets, Xu H., Gao Y. et al, 2016. (video)
End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies, Eraqi H. et al, 2017.
Driving Like a Human: Imitation Learning for Path Planning using Convolutional Neural Networks, Rehder E. et al, 2017.
Imitating Driver Behavior with Generative Adversarial Networks, Kuefler A. et al, 2017.
PS-GAIL Multi-Agent Imitation Learning for Driving Simulation, Bhattacharyya R. et al, 2018. (video)

Inverse Reinforcement Learning

Projection Apprenticeship learning via inverse reinforcement learning, Abbeel P. Ng A. 2004.
MMP Maximum margin planning, Ratliff N. et al, 2006.
BIRL Bayesian inverse reinforcement learning, Ramachandran D. Amir E., 2007.
MEIRL Maximum Entropy Inverse Reinforcement Learning, Ziebart B. et al, 2008.
CIOC Continuous Inverse Optimal Control with Locally Optimal Examples, Levine S., Koltun V., 2012. (video)
MEDIRL Maximum Entropy Deep Inverse Reinforcement Learning, Wulfmeier M., 2015.
GCL Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, Finn C. et al, 2016. (video)
RIRL Repeated Inverse Reinforcement Learning, Amin K. et al, 2017.
Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning, Piot B. et al, 2017.

Applications to Autonomous Driving

Apprenticeship Learning for Motion Planning, with Application to Parking Lot Navigation, Abbeel P. et al, 2008.
Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior, Ziebart B. et al, 2008.
Planning-based Prediction for Pedestrians, Ziebart B. et al, 2009. (video)
Learning Driving Styles for Autonomous Vehicles from Demonstration, Kuderer M. et al, 2015.
Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks, Sharifzadeh S. et al, 2016.
Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments, Wulfmeier M., 2016. (video)
Planning for Autonomous Cars that Leverage Effects on Human Actions, Sadigh D. et al, 2016.
A Learning-Based Framework for Handling Dilemmas in Urban Automated Driving, Lee S., Seo S., 2017.

Motion Planning

Search

Dijkstra A Note on Two Problems in Connexion with Graphs, Dijkstra E. W., 1959.
A* A Formal Basis for the Heuristic Determination of Minimum Cost Paths , Hart P. et al, 1968.
Planning Long Dynamically-Feasible Maneuvers For Autonomous Vehicles, Likhachev M., Ferguson D., 2008.
Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame, Werling M., Kammel S., 2010. (video)
3D perception and planning for self-driving and cooperative automobiles, Stiller C., Ziegler J., 2012.
Motion Planning under Uncertainty for On-Road Autonomous Driving, Xu W. et al, 2014.
Monte Carlo Tree Search for Simulated Car Racing, Fischer J. et al, 2015. (video)

Sampling

RRT* Sampling-based Algorithms for Optimal Motion Planning, Karaman S., Frazzoli E., 2011. (video)
LQG-MP LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information, van den Berg J. et al, 2010.
Motion Planning under Uncertainty using Differential Dynamic Programming in Belief Space, van den Berg J. et al, 2011.
Rapidly-exploring Random Belief Trees for Motion Planning Under Uncertainty, Bry A., Roy N., 2011.
PRM-RL PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning, Faust A. et al, 2017.

Optimization

Trajectory planning for Bertha - A local, continuous method, Ziegler J. et al, 2014.
Learning Attractor Landscapes for Learning Motor Primitives, Ijspeert A. et al, 2002.

Reactive

PF Real-time obstacle avoidance for manipulators and mobile robots, Khatib O., 1986.
VFH The Vector Field Histogram - Fast Obstacle Avoidance For Mobile Robots, Borenstein J., 1991.
VFH+ VFH+: Reliable Obstacle Avoidance for Fast Mobile Robots, Ulrich I., Borenstein J., 1998.
Velocity Obstacles Motion planning in dynamic environments using velocity obstacles, Fiorini P., Shillert Z., 1998.

Architecture and applications

A Review of Motion Planning Techniques for Automated Vehicles, González D. et al, 2016.
A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles, Paden B. et al, 2016.
Autonomous driving in urban environments: Boss and the Urban Challenge, Urmson C. et al, 2008.
The MIT-Cornell collision and why it happened, Fletcher L. et al, 2008.
Making bertha drive-an autonomous journey on a historic route, Ziegler J. et al, 2014.