Safe-Deep-Reinforcement-Learning

Safe Reinforcement Learning: Process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes.

Contributed by Chunyang Zhang.

1. Survey
2. Methodology
2.1 Model Based 2.2 Model Free
2.3 Policy Optimization 2.4 Bandit
2.5 Barrier Function 2.6 Actor Critic
2.7 Large Model 2.8 Dynamics Modeling
2.9 Evaluation 2.10 Offline Learning
2.11 Adversarial Reinforcement Learning 2.12 Inverse Reinforcement Learning
3. Mechanism
3.1 Analysis 3.2 Library
3.3 Theory 3.4 Reward
3.5 Cost Function 3.6 Primal Dual
3.7 Deployment 3.8 Domain Adaptation
3.9 Diffusion Model 3.10 Transformer
3.11 Generative Model 3.12 Simulation
3.13 Lagrangian 3.14 Causal Reasoning
3.15 Out of Distribution 3.16 Curriculum Learning
3.17 Continual Learning 3.18 Safe Set
3.19 Latent Space 3.20 Knowledge Distillation
3.21 Multi Agent 3.22 Multi Task
3.23 Markov Decision Process
4. Application
4.1 Autonomous Driving 4.2 Three Dimension
4.3 Cyber Attack 4.4 Robotics
4.5 Power System
  1. A comprehensive survey on safe reinforcement learning. JMLR, 2015. paper

    Javier Garcí and Fern o Fernández.

  2. Policy learning with constraints in model-free reinforcement learning: A survey. IJCAI, 2021. paper

    Yongshuai Liu, Avishai Halev, and Xin Liu.

  3. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 2022. paper

    Lukas Brunke, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P. Schoellig.

  4. A review of safe reinforcement learning: Methods, theory and applications. arXiv, 2022. paper

    Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang, and Alois Knoll.*

  5. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 2021. paper

    Kiran, B Ravi and Sobh, Ibrahim and Talpaert, Victor and Mannion, Patrick and Sallab, Ahmad A. Al and Yogamani, Senthil, and Pérez, Patrick.

  6. State-wise safe reinforcement learning: A survey. arXiv, 2023. paper

    Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, and Changliu Liu.

  7. Modeling risk in reinforcement learning: A literature mapping. arXiv, 2023. paper

    Leonardo Villalobos-Arias, Derek Martin, Abhijeet Krishnan, Madeleine Gagné, Colin M. Potts, and Arnav Jhala.

  8. Safe and robust reinforcement-learning: Principles and practice. arXiv, 2024. paper

    Taku Yamagata and Raul Santos-Rodriguez.

  1. Provably efficient reinforcement learning with linear function approximation. ICML, 2020. paper

    Chi Jin, Zhuoran Yang, Zhaoran Wang, and Michael I Jordan.

  2. Model-based safe deep reinforcement learning via a constrained proximal policy optimization algorithm. NIPS, 2022. paper

    Ashish Kumar Jayant and Shalabh Bhatnagar.

  3. Conservative and adaptive penalty for model-based safe reinforcement learning. AAAI, 2022. paper

    Yecheng Jason Ma, Andrew Shen, Osbert Bastani, and Dinesh Jayaraman.

  4. DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning. NIPS, 2022. paper

    Archana Bura, Aria Hasanzadezonuzy, Dileep Kalathil, Srinivas Shakkottai, and Jean-Francois Chamberland.

  5. Risk sensitive model-based reinforcement learning using uncertainty guided planning. NIPS, 2021. paper

    Garrett Thomas, Yuping Luo, and Tengyu Ma.

  6. Safe reinforcement learning by imagining the near future. NIPS, 2021. paper

    Stefan Radic Webster and Peter Flach.

  7. Approximate model-based shielding for safe reinforcement learning. ECAL, 2023. paper

    Alexander W. Goodall and Francesco Belardinelli.

  1. Model-free safe control for zero-violation reinforcement learning. CoRL, 2022. paper

    Weiye Zhao, Tairan He, and Changliu Liu.

  2. More for less: Safe policy improvement with stronger performance guarantees. IJCAI, 2023. paper

    Patrick Wienhöft, Marnix Suilen, Thiago D. Simão, Clemens Dubslaff, Christel Baier, and Nils Jansen.

  3. Model-free safe reinforcement learning through neural barrier certificate. RAL, 2023. paper

    Yujie Yang, Yuxuan Jiang, Yichen Liu, Jianyu Chen, and Shengbo Eben Li.

  4. Provably efficient model-free constrained RL with linear function approximation. NIPS, 2022. paper

    Arnob Ghosh, Xingyu Zhou, and Ness Shroff.

  5. Provably efficient model-free algorithms for non-stationary CMDPs. arXiv, 2023. paper

    Honghao Wei, Arnob Ghosh, Ness Shroff, Lei Ying, and Xingyu Zhou.

  6. Anytime-competitive reinforcement learning with policy prior. NIPS, 2023. paper

    Jianyi Yang, Pengfei Li, Tongxin Li, Adam Wierman, and Shaolei Ren.

  7. Anytime-constrained reinforcement learning. arXiv, 2023. paper

    Jeremy McMahan and Xiaojin Zhu.

  1. Constrained policy optimization. ICML, 2017. paper

    Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel.

  2. Reward constrained policy optimization. ICLR, 2019. paper

    Chen Tessler, Daniel J. Mankowitz, and Shie Mannor.

  3. Projection-based constrained policy optimization. ICLR, 2020. paper

    Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, and Peter J. Ramadge.

  4. CRPO: A new approach for safe reinforcement learning with convergence guarantee. ICML, 2021. paper

    Tengyu Xu, Yingbin Liang, and Guanghui Lan.

  5. When to update your model: Constrained model-based reinforcement learning. NIPS, 2022. paper

    Tianying Ji, Yu Luo, Fuchun Sun, Mingxuan Jing, Fengxiang He, and Wenbing Huang.

  6. Constraints penalized Q-learning for safe offline reinforcement learning. AAAI, 2022. paper

    Haoran Xu, Xianyuan Zhan, and Xiangyu Zhu.

  7. Exploring safer behaviors for deep reinforcement learning. AAAI, 2022. paper

    Enrico Marchesini, Davide Corsi, and Alessandro Farinelli.

  8. Constrained proximal policy optimization. arXiv, 2023. paper

    Chengbin Xuan, Feng Zhang, Faliang Yin, and Hak-Keung Lam.

  9. Safe policy improvement for POMDPs via finite-state controllers. AAAI, 2023. paper

    Thiago D. Simão, Marnix Suilen, and Nils Jansen.

  10. Policy regularization with dataset constraint for offline reinforcement learning. ICML, 2023. paper

    Yuhang Ran, Yichen Li, Fuxiang Zhang, Zongzhang Zhang, and Yang Yu.

  11. Constrained update projection approach to safe policy optimization. NIPS, 2022. paper

    Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, and Gang Pan.

  12. Constrained variational policy optimization for safe reinforcement learning. ICML, 2022. paper

    Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Steven Wu, Bo Li, and Ding Zhao.

  13. Towards safe reinforcement learning with a safety editor policy. NIPS, 2022. paper

    Haonan Yu, Wei Xu, and Haichao Zhang.

  14. CUP: A conservative update policy algorithm for safe reinforcement learning. arXiv, 2022. paper

    Long Yang, Jiaming Ji, Juntao Dai, Yu Zhang, Pengfei Li, and Gang Pan.

  15. State-wise constrained policy optimization. arXiv, 2023. paper

    Weiye Zhao, Rui Chen, Yifan Sun, Tianhao Wei, and Changliu Liu.

  16. Scalable safe policy improvement via Monte Carlo tree search. ICML, 2023. paper

    Alberto Castellini, Federico Bianchi, Edoardo Zorzi, Thiago D. Simão, Alessandro Farinelli, and Matthijs T. J. Spaan.

  17. Towards robust and safe reinforcement learning with Benign off-policy data. ICML, 2023. paper

    Zuxin Liu, Zijian Guo, Zhepeng Cen, Huan Zhang, Yihang Yao, Hanjiang Hu, and Ding Zhao.

  18. Constraint-conditioned policy optimization for versatile safe reinforcement learning. arXiv, 2023 paper

    Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, and Ding Zhao.

  19. Quantile constrained reinforcement learning: A reinforcement learning framework constraining outage probability. NIPS, 2022. paper

    Whiyoung Jung, Myungsik Cho, Jongeui Park, and Youngchul Sung.

  20. Reinforcement learning in a safety-embedded MDP with trajectory optimization. arXiv, 2023. paper

    Fan Yang, Wenxuan Zhou, Zuxin Liu, Ding Zhao, and David Held.

  21. Recursively-constrained partially observable Markov decision processes. arXiv, 2023。 paper

    Qi Heng Ho, Tyler Becker, Ben Kraske, Zakariya Laouar, Martin Feather, Federico Rossi, Morteza Lahijanian, and Zachary N. Sunberg.

  22. TRC: Trust region conditional value at risk for safe reinforcement learning. arXiv, 2023. paper

    Dohyeong Kim and Songhwai Oh.

  23. Transition constrained bayesian optimization via Markov decision processes. arXiv, 2024. paper

    Jose Pablo Folch, Calvin Tsay, Robert M Lee, Behrang Shafei, Weronika Ormaniec, Andreas Krause, Mark van der Wilk, Ruth Misener, and Mojmír Mutný.

  24. Safety optimized reinforcement learning via multi-objective policy optimization. ICRA, 2024. paper

    Homayoun Honari, Mehran Ghafarian Tamizi, and Homayoun Najjaran.

  25. Double duality: Variational primal-dual policy optimization for constrained reinforcement learning. arXiv, 2024. paper

    Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, and Mengdi Wang.

  26. ACPO: A policy optimization algorithm for average MDPs with constraints. ICML, 2024. paper

    Akhil Agnihotri, Rahul Jain, and Haipeng Luo.

  27. Spectral-risk safe reinforcement learning with convergence guarantees. arXiv, 2024. paper

    Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, and Songhwai Oh.

  28. Enhancing efficiency of safe reinforcement learning via sample manipulation. arXiv, 2024. paper

    Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas Spanos, Adam Wierman, and Ming Jin.

  1. Probably anytime-safe stochastic combinatorial semi-bandits. ICML, 2023. paper

    Yunlong Hou, Vincent Tan, and Zixin Zhong.

  1. Lyapunov-based safe policy optimization for continuous control. ICML, 2019. paper

    Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duez-Guzmn, and Mohamamd Ghavamzadeh.

  2. Lyapunov design for safe reinforcement learning. JMLR, 2002. paper

    Theodore J. Perkins and Andrew G. Barto.

  3. Value functions are control barrier functions: Verification of safe policies using control theory. arXiv, 2023. paper

    Daniel C.H. Tan, Fernando Acero, Robert McCarthy, Dimitrios Kanoulas, and Zhibin Li.

  4. Safe exploration in model-based reinforcement learning using control barrier functions. Automatica, 2023. paper

    Max H. Cohen and Calin Belta.

  5. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. ICML, 2023. paper

    Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, and Qi Zhu.

  6. State wise safe reinforcement learning with pixel observations. arXiv, 2023. paper

    Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, and Qi Zhu.

  7. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. ICML, 2023. paper

    Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, and Qi Zhu.

  8. Safe and efficient reinforcement learning using disturbance-observer-based control barrier functions. ICML, 2023. paper

    Yikun Cheng, Pan Zhao, and Naira Hovakimyan.

  9. NLBAC: A neural ordinary differential equations-based framework for stable and safe reinforcement learning. arXiv, 2024. paper

    Liqun Zhao, Keyan Miao, Konstantinos Gatsis, and Antonis Papachristodoulou.

  1. WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning. AAAI, 2021. paper

    Qisong Yang, Thiago D. Simao, Simon H. Tindemans, and Matthijs T. J. Spaan.

  2. Finite time analysis of constrained actor critic and constrained natural actor critic algorithms. arXiv, 2023. paper

    Prashansa Panda and Shalabh Bhatnagar.

  3. DSAC-C: Constrained maximum entropy for robust discrete soft-actor critic. arXiv, 2023. paper

    Dexter Neo and Tsuhan Chen.

  4. SCPO: Safe reinforcement learning with safety critic policy optimization. arXiv, 2023. paper

    Jaafar Mhamed and Shangding Gu.

  5. Adversarially trained actor critic for offline CMDPs. arXiv, 2024. paper

    Honghao Wei, Xiyue Peng, Xin Liu, and Arnob Ghosh.

  1. Parameter-efficient tuning helps language model alignment. arXiv, 2023. paper

    Tianci Xue, Ziqi Wang, and Heng Ji.

  2. Confronting reward model overoptimization with constrained RLHF. arXiv, 2023. paper

    Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, and Stephen McAleer.

  3. Safe RLHF: Safe reinforcement learning from human feedback. ICLR, 2024. paper

    Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang.

  4. Safe reinforcement learning with free-form natural language constraints and pre-trained language models. arXiv, 2024. paper

    Xingzhou Lou, Junge Zhang, Ziyan Wang, Kaiqi Huang, and Yali Du.

  1. SaFormer: A conditional sequence modeling approach to offline safe reinforcement learning. arXiv, 2023. paper

    Qin Zhang, Linrui Zhang, Haoran Xu, Li Shen, Bowen Wang, Yongzhe Chang, Xueqian Wang, Bo Yuan, and Dacheng Tao.

  2. Model-free, regret-optimal best policy identification in online CMDPs. arXiv, 2023. paper

    Zihan Zhou, Honghao Wei, and Lei Ying.

  3. Scalable and efficient continual learning from demonstration via hypernetwork-generated stable dynamics model. arXiv, 2023. paper

    Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano, Antonio Rodríguez-Sánchez, and Justus Piater.

  4. SafeDreamer: Safe reinforcement learning with world models. ICLR, 2024. paper

    Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, and Yaodong Yang.

  5. Dynamic model predictive shielding for provably safe reinforcement learning. arXiv, 2024. paper

    Arko Banerjee, Kia Rahmani, Joydeep Biswas, and Isil Dillig.

  6. Verified safe reinforcement learning for neural network dynamic models. arXiv, 2024. paper

    Junlin Wu, Huan Zhang, and Yevgeniy Vorobeychik.

  1. Evaluating model-free reinforcement learning toward safety-critical tasks. AAAI, 2023. paper

    Linrui Zhang, Qin Zhang, Li Shen, Bo Yuan, Xueqian Wang, and Dacheng Tao.

  1. Safe evaluation for offline learning: Are we ready to deploy? arXiv, 2022. paper

    Hager Radi, Josiah P. Hanna, Peter Stone, and Matthew E. Taylor.

  2. Safe offline reinforcement learning with real-time budget constraints. ICML, 2023. paper

    Qian Lin, Bo Tang, Zifan Wu, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, and Dong Wang.

  3. Efficient off-policy safe reinforcement learning using trust region conditional value at risk. arXiv, 2023. paper

    Dohyeong Kim and Songhwai Oh.

  4. Provable safe reinforcement learning with binary feedback. AISTATS, 2023. paper

    Andrew Bennett, Dipendra Misra, and Nathan Kallus.

  5. Long-term safe reinforcement learning with binary feedback. AAAI, 2024. paper

    Akifumi Wachi, Wataru Hashimoto, and Kazumune Hashimoto.

  6. Off-policy primal-dual safe reinforcement learning. ICLR, 2024. paper

    Zifan Wu, Bo Tang, Qian Lin, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, and Dong Wang.

  1. Learning-aware safety for interactive autonomy. arXiv, 2023. paper

    Haimin Hu, Zixu Zhang, Kensuke Nakamura, Andrea Bajcsy, and Jaime F. Fisac.

  2. Robust safe reinforcement learning under adversarial disturbances. arXiv, 2023. paper

    Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, and Yunan Wang.

  1. Inverse constrained reinforcement learning. ICML, 2021. paper

    Shehryar Malik, Usman Anwar, Alireza Aghasi, and Ali Ahmed.

  2. Benchmarking constraint inference in inverse reinforcement learning. ICLR, 2023. paper

    Guiliang Liu, Yudong Luo, Ashish Gaurav, Kasra Rezaee, and Pascal Poupart.

  3. Maximum causal entropy inverse constrained reinforcement learning. arXiv, 2023. paper

    Mattijs Baert, Pietro Mazzaglia, Sam Leroux, and Pieter Simoens.

  4. Identifiability and generalizability in constrained inverse reinforcement learning. ICML, 2023. paper

    Andreas Schlaginhaufen and Maryam Kamgarpour.

  5. FP-IRL: Fokker-Planck-based inverse reinforcement learning -- A physics-constrained approach to Markov decision processes. ICML, 2023. paper

    Chengyang Huang, Siddhartha Srivastava, Xun Huan, and Krishna Garikipati.

  1. On the robustness of safe reinforcement learning under observational perturbations. ICLR, 2023. paper

    Zuxin Liu, Zijian Guo, Zhepeng Cen, Huan Zhang, Jie Tan, Bo Li, and Ding Zhao.

  2. Detecting adversarial directions in deep reinforcement learning to make robust decisions. ICML, 2023. paper

    Ezgi Korkmaz and Jonah Brown-Cohen.

  3. Efficient trust region-based safe reinforcement learning with low-bias distributional actor-critic. arXiv, 2023. paper

    Dohyeong Kim, Kyungjae Lee, and Songhwai Oh.

  4. Don't do it: Safer reinforcement learning with rule-based guidance. arXiv, 2022. paper

    Ekaterina Nikonova, Cheng Xue, and Jochen Renz.

  5. Saute RL: Almost surely safe reinforcement learning using state augmentation. ICML, 2022. paper

    Aivar Sootla, Alexander I Cowen-Rivers, Taher Jafferjee, Ziyan Wang, David H Mguni, Jun Wang, and Haitham Ammar.

  6. Provably efficient exploration in constrained reinforcement learning: Posterior sampling is all you need. arXiv, 2023. paper

    Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, and Maurits Kaptein.

  7. Safe exploration in reinforcement learning: A generalized formulation and algorithms. NIPS, 2023. paper

    Akifumi Wachi, Wataru Hashimoto, Xun Shen, and Kazumune Hashimoto.

  8. Sample-efficient and safe deep reinforcement learning via reset deep ensemble agents. NIPS, 2023. paper

    Woojun Kim, Yongjae Shin, Jongeui Park, and Youngchul Sung.

  9. Imitate the good and avoid the bad: An incremental approach to safe reinforcement learning. AAAI, 2024. paper

    Huy Hoang and Tien Mai Pradeep Varakantham.

  1. GUARD: A safe reinforcement learning benchmark. arXiv, 2023. paper

    Weiye Zhao, Rui Chen, Yifan Sun, Ruixuan Liu, Tianhao Wei, and Changliu Liu.

  2. OmniSafe: An infrastructure for accelerating safe reinforcement learning research. arXiv, 2023. paper

    Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, and Yaodong Yang.

  3. Datasets and benchmarks for offline safe reinforcement learning. arXiv, 2023. paper

    Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, and Ding Zhao.

  4. InterCode: Standardizing and benchmarking interactive coding with execution feedback. arXiv, 2023. paper

    John Yang, Akshara Prabhakar, Karthik Narasimhan, and Shunyu Yao.

  5. Safety-Gymnasium: A unified safe reinforcement learning benchmark. arXiv, 2023. paper

    Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, and Yaodong Yang.

  6. Controlgym: Large-scale safety-critical control environments for benchmarking reinforcement learning algorithms. arXiv, 2023. paper

    Xiangyuan Zhang, Weichao Mao, Saviz Mowlavi, Mouhacine Benosman, and Tamer Başar.

  1. Near-optimal conservative exploration in reinforcement learning under episode-wise constraints. ICML, 2023. paper

    Donghao Li, Ruiquan Huang, Cong Shen, and Jing Yang.

  2. Near-optimal sample complexity bounds for constrained MDPs. NIPS, 2022. paper

    Sharan Vaswani, Lin F. Yang, and Csaba Szepesvári.

  3. Learning policies with zero or bounded constraint violation for constrained MDPs. NIPS, 2021. paper

    Tao Liu, Ruida Zhou, Dileep Kalathil, Panganamala Kumar, and Chao Tian.

  4. Provably learning Nash policies in constrained Markov potential games. arXiv, 2023. paper

    Pragnya Alatur, Giorgia Ramponi, Niao He, and Andreas Krause.

  5. Provably safe reinforcement learning: A theoretical and experimental comparison. arXiv, 2022. paper

    Hanna Krasowski, Jakob Thumm, Marlon Müller, Lukas Schäfer, Xiao Wang, and Matthias Althoff.

  6. Shielded reinforcement learning for hybrid systems. CoRL, 2023. paper

    Asger Horn Brorholt, Peter Gjøl Jensen, Kim Guldstrand Larsen, Florian Lorber, and Christian Schilling.

  7. Safe reinforcement learning in tensor reproducing kernel Hilbert space. arXiv, 2023. paper

    Xiaoyuan Cheng, Boli Chen, Liz Varga, and Yukun Hu.

  8. Joint chance-constrained Markov decision processes. Annals of Operations Research, 2022. paper

    V Varagapriya, Vikas Vikram Singh, and Abdel Lisser.

  9. Approximate solutions to constrained risk-sensitive Markov decision processes. European Journal of Operational Research, 2023. paper

    Uday M Kumar, Sanjay P. Bhat, Veeraruna Kavitha, and Nandyala Hemachandra.

  10. Nearly minimax optimal reinforcement learning for linear Markov decision processes. ICML, 2023. paper

    Jiafan He, Heyang Zhao, Dongruo Zhou, and Quanquan Gu.

  11. Truly no-regret learning in constrained MDPs. arXiv, 2024. paper

    Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, and Niao He.

  12. Achieving O~(1/ε) sample complexity for constrained markov decision process. arXiv, 2024. paper

    Jiashuo Jiang and Yinyu Ye.

  13. Sampling-based safe reinforcement learning for nonlinear dynamical systems. arXiv, 2024. paper

    Wesley A. Suttle, Vipul K. Sharma, Krishna C. Kosaraju, S. Sivaranjani, Ji Liu, Vijay Gupta, and Brian M. Sadler.

  14. Learning adversarial MDPs with stochastic hard constraints. arXiv, 2024. paper

    Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, and Nicola Gatti.

  15. ConstrainedZero: Chance-constrained POMDP planning using learned probabilistic failure surrogates and adaptive safety constraints. IJCAI, 2024. paper

    Robert J. Moss, Arec Jamgochian, Johannes Fischer, Anthony Corso, and Mykel J. Kochenderfer.

  16. Efficient exploration in average-reward constrained reinforcement learning: Achieving near-optimal regret with posterior sampling. ICML, 2024. paper

    Danil Provodin, Maurits Kaptein, and Mykola Pechenizkiy.

  17. Achieving tractable minimax optimal regret in average reward MDPs. arXiv, 2024. paper

    Victor Boone and Zihan Zhang.

  1. Redeeming intrinsic rewards via constrained optimization. AAAI, 2023. paper

    Tairan He, Weiye Zhao, and Changliu Liu.

  2. Anchor-changing regularized natural policy gradient for multi-objective reinforcement learning. NIPS, 2022. paper

    Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, and Chao Tian.

  3. ROSARL: Reward-only safe reinforcement learning. arXiv, 2023. paper

    Geraud Nangue Tasse, Tamlin Love, Mark Nemecek, Steven James, and Benjamin Rosman.

  4. Solving richly constrained reinforcement learning through state augmentation and reward penalties. arXiv, 2023. paper

    Hao Jiang, Tien Mai, Pradeep Varakantham, and Minh Huy Hoang.

  5. State augmented constrained reinforcement learning: Overcoming the limitations of learning with rewards. TAC, 2023. paper

    Miguel Calvo-Fullana, Santiago Paternain, Luiz F. O. Chamon, and Alejandro Ribeiro.

  1. AutoCost: Evolving intrinsic cost for zero-violation reinforcement learning. AAAI, 2023. paper

    Tairan He, Weiye Zhao, and Changliu Liu.

  1. Semi-infinitely constrained markov decision processes and efficient reinforcement learning. NIPS, 2022. paper

    Liangyu Zhang, Yang Peng, Wenhao Yang, and Zhihua Zhang.

  2. Policy-based primal-dual methods for convex constrained markov decision processes. AAAI, 2023. paper

    Donghao Ying, Mengzi Amy Guo, Yuhao Ding, Javad Lavaei, and Zuo-Jun Max Shen.

  3. Provably efficient primal-dual reinforcement learning for CMDPs with non-stationary objectives and constraints. AAAI, 2023. paper

    Yuhao Ding and Javad Lavaei.

  4. Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. AAAI, 2022. paper

    Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, and Vaneet Aggarwal.

  5. Probabilistic constraint for safety-critical reinforcement learning. arXiv, 2023. paper

    Weiqin Chen, Dharmashankar Subramanian, and Santiago Paternain.

  6. Last-iterate convergent policy gradient primal-dual methods for constrained MDPs. arXiv, 2023. paper

    Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, and Alejandro Ribeiro.

  7. Learning-aware safety for interactive autonomy. arXiv, 2023. paper

    Haimin Hu, Zixu Zhang, Kensuke Nakamura, Andrea Bajcsy, and Jaime Fernandez Fisac.

  8. Safe reinforcement learning with dual robustness. arXiv, 2023. paper

    Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, and Shengbo Eben Li.

  9. Achieving zero constraint violation for constrained reinforcement learning via conservative natural policy gradient primal-dual algorithm. AAAI, 2022. paper

    Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, and Vaneet Aggarwal.

  10. Distributionally safe reinforcement learning under model uncertainty: A single-level approach by differentiable convex programming. arXiv, 2023. paper

    Alaa Eddine Chriat and Chuangchuang Sun.

  11. A policy gradient primal-dual algorithm for constrained MDPs with uniform PAC guarantees. arXiv, 2024. paper

    Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, and Yutaka Matsuo.

  12. Adaptive primal-dual method for safe reinforcement learning. arXiv, 2024. paper

    Weiqin Chen, James Onyejizu, Long Vu, Lan Hoang, Dharmashankar Subramanian, Koushik Kar, Sandipan Mishra, and Santiago Paternain.

  13. Learning general parameterized policies for infinite horizon average reward constrained MDPs via primal-dual policy gradient algorithm. arXiv, 2024. paper

    Qinbo Bai, Washim Uddin Mondal, and Vaneet Aggarwal.

  1. Towards deployment-efficient reinforcement learning: Lower bound and optimality. ICLR, 2022. paper

    Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, and Tie-Yan Liu.

  2. Benchmarking constraint inference in inverse reinforcement learning. ICLR, 2023. paper

    Guiliang Liu, Yudong Luo, Ashish Gaurav, Kasra Rezaee, and Pascal Poupart.

  1. A CMDP-within-online framework for meta-safe reinforcement learning. ICLR, 2023. paper

    Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, and Ming Jin.

  2. Reinforcement learning by guided safe exploration. ECAI, 2023. paper

    Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, and Matthijs T. J. Spaan.

  1. Trajectory generation, control, and safety with denoising diffusion probabilistic models. arXiv, 2023. paper

    Nicolò Botteghi, Federico Califano, Mannes Poel, and Christoph Brune.

  2. DiffCPS: Diffusion model based constrained policy search for offline reinforcement learning. arXiv, 2023. paper

    Longxiang He, Linrui Zhang, Junbo Tan, and Xueqian Wang.

  3. Feasibility-guided safe offline reinforcement learning. ICLR, 2024. paper

    Longxiang He, Linrui Zhang, Junbo Tan, and Xueqian Wang.

  1. Constrained decision Transformer for offline safe reinforcement learning. ICML, 2023. paper

    Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, and Ding Zhao.

  2. Transdreamer: Reinforcement learning with Transformer world models. arXiv, 2022. paper

    Chang Chen, Yi-Fu Wu, Jaesik Yoon, and Sungjin Ahn.

  3. Temporal logic specification-conditioned decision Transformer for offline safe reinforcement learning. arXiv, 2024. paper

    Zijian Guo, Weichao Zhou, and Wenchao Li.

  1. Policy learning for robust markov decision process with a mismatched generative model. AAAI, 2022. paper

    Jialian Li, Tongzheng Ren, Dong Yan, Hang Su, and Jun Zhu.

  1. Sim-to-Lab-to-Real: Safe reinforcement learning with shielding and generalization guarantees. Artificial Intelligence, 2023. paper

    Kai-Chieh Hsu, Allen Z. Ren, Duy P. Nguyen, Anirudha Majumdar, and Jaime F. Fisac.

  1. Responsive safety in reinforcement learning by PID lagrangian methods. ICML, 2020. paper

    Adam Stooke, Joshua Achiam, and Pieter Abbeel.

  2. Safe Dreamer: Safe reinforcement learning with world mdels. arXiv 2023. paper

    Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, and Yaodong Yang.

  3. Robust Lagrangian and adversarial policy gradient for robust constrained Markov decision processes. arXiv 2023. paper

    David M. Bossens.

  4. Safe reinforcement learning as Wasserstein variational inference: Formal methods for interpretability. arXiv, 2023. paper

    Yanran Wang and David Boyle.

  5. Gradient shaping for multi-constraint safe reinforcement learning. arXiv, 2023. paper

    Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao Yu, and Ding Zhao.

  1. Causal temporal reasoning for Markov decision processes. arXiv, 2022. paper

    Milad Kazemi and Nicola Paoletti.

  1. Can agents run relay race with dtrangers? Generalization of RL to out-of-distribution trajectories. ICRL, 2023. paper

    Licheng Lan, Huan Zhang, andCho-Jui Hsieh.

  1. Safe reinforcement learning via curriculum induction. NIPS, 2020. paper

    Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, and Alekh Agarwal.

  2. Concurrent learning of policy and unknown safety constraints in reinforcement learning. arXiv, 2024. paper

    Lunet Yifru and Ali Baheri.

  1. Experience replay for continual learning. NIPS, 2019. paper

    David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne.

  2. Safe model-based multi-agent mean-field reinforcement learning. arXiv, 2023. paper

    Matej Jusup, Barna Pásztor, Tadeusz Janik, Kenan Zhang, Francesco Corman, Andreas Krause, and Ilija Bogunovic.

  3. Continual learning as computationally constrained reinforcement learning.. arXiv, 2023. paper

    Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, and Benjamin Van Roy.

  1. Safe reinforcement learning in constrained Markov decision processes. ICML, 2020. paper

    Akifumi Wachi and Yanan Sui.

  2. Reachability constrained reinforcement learning. ICML, 2022. paper

    Dongjie Yu, Haitong Ma, Shengbo Li, and Jianyu Chen.

  3. A near-optimal algorithm for safe reinforcement learning under instantaneous hard constraints. ICML, 2023. paper

    Ming Shi, Yingbin Liang, and Ness Shroff.

  4. Iterative reachability estimation for safe reinforcement learning. NIPS, 2023. paper

    Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, and Sicun Gao.

  5. Risk-sensitive inhibitory control for safe reinforcement learning. ACC, 2023. paper

    Armin Lederer, Erfaun Noorani, John S. Baras, and Sandra Hirche.

  6. Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv, 2023. paper

    Zhaorun Chen, Binhao Chen, Tairan He, Liang Gong, and Chengliang Liu.

  7. Learn with imagination: Safe set guided state-wise constrained policy optimization. AAAI, 2024. paper

    Weiye Zhao, Yifan Sun, Feihan Li, Rui Chen, Tianhao Wei, and Changliu Liu.

  8. Safe reinforcement learning via shielding under partial observability. AAAI, 2023. paper

    Steven Carr, Nils Jansen, Sebastian Junges, and Ufuk Topcu.

  9. Safe reinforcement learning with learned non-Markovian safety constraints. arXiv, 2024. paper

    Siow Meng Low and Akshat Kumar.

  10. Feasibility consistent representation learning for safe reinforcement learning. ICML, 2024. paper

    Zhepeng Cen, Yihang Yao, Zuxin Liu, and Ding Zhao.

  11. Safe reinforcement learning in black-box environments via adaptive shielding. arXiv, 2024. paper

    Daniel Bethell, Simos Gerasimou, Radu Calinescu, and Calum Imrie.

  1. Safe reinforcement learning from pixels using a stochastic latent representation. ICLR, 2023. paper

    Yannick Hogewind, Thiago D. Simao, Tal Kachman, and Nils Jansen.

  1. Coaching a teachable student. CVPR, 2023. paper

    Jimuyang Zhang, Zanming Huang, and Eshed Ohn-Bar.

  1. Provably efficient generalized Lagrangian policy optimization for safe multi-agent reinforcement learning. JMLR, 2023. paper

    Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, and Mihailo R. Jovanovic.

  2. Learning adaptive safety for multi-agent systems. arXiv, 2023. paper

    Luigi Berducci, Shuo Yang, Rahul Mangharam, and Radu Grosu.

  3. Safe multi-agent reinforcement learning with natural language constraints. arXiv, 2024. paper

    Ziyan Wang, Meng Fang, Tristan Tomilin, Fei Fang, and Yali Du.

  1. Learning shared safety constraints from multi-task demonstrations. arXiv, 2023. paper

    Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, and Zhiwei Steven Wu.

  2. Safe and balanced: A framework for constrained multi-objective reinforcement learning. arXiv, 2024. paper

    Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Alois Knoll, and Ming Jin

  1. GenSafe: A generalizable safety enhancer for safe reinforcement learning algorithms based on reduced order Markov decision process model. arXiv, 2024. paper

    Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, and Lei Ma.

  1. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 2023. paper

    Shuo Feng, Haowei Sun, Xintao Yan, Haojie Zhu, Zhengxia Zou, Shengyin Shen, and Henry X. Liu.

  2. Enhancing system-level safety in mixed-autonomy platoon via safe reinforcement learning. arXiv, 2024. paper

    Jingyuan Zhou, Longhao Yan, and Kaidi Yang.

  3. Multi-constraint safe RL with objective suppression for safety-critical applications. arXiv, 2024. paper

    Zihan Zhou, Jonathan Booher, Wei Liu, Aleksandr Petiushko, and Animesh Garg.

  4. Do no harm: A counterfactual approach to safe reinforcement learning. arXiv, 2024. paper

    Sean Vaskov, Wilko Schwarting, and Chris L. Baker.

  5. Safe multi-agent reinforcement learning with bilevel optimization in autonomous driving. arXiv, 2024. paper

    Zhi Zheng and Shangding Gu.

  1. Online 3D bin packing with constrained deep reinforcement learning. AAAI, 2021. paper

    Hang Zhao, Qijin She, Chenyang Zhu, Yin Yang, and Kai Xu.

  1. Spatiotemporally constrained action space attacks on deep reinforcement learning agents. AAAI, 2020. paper

    Xian Yeow Lee, Sambit Ghadai, Kai Liang Tan, Chinmay Hegde, and Soumik Sarkar.

  1. Evaluation of constrained reinforcement learning algorithms for legged locomotion. arXiv, 2023. paper

    Joonho Lee, Lukas Schroth, Victor Klemm, Marko Bjelonic, Alexander Reske, and Marco Hutter.

  2. Learning safe control for multi-robot systems: Methods, verification, and open challenges. arXiv, 2023. paper

    Kunal Garg, Songyuan Zhang, Oswin So, Charles Dawson, and Chuchu Fan.

  3. Constrained reinforcement learning for dexterous manipulation. IJCAI, 2022. paper

    Abhineet Jain, Jack Kolb, and Harish Ravichandar.

  4. Safe multi-agent reinforcement learning for formation control without individual reference targets. arXiv, 2023. paper

    Murad Dawood, Sicong Pan, Nils Dengler, Siqi Zhou, Angela P. Schoellig, and Maren Bennewitz.

  5. Safe reinforcement learning in uncertain contexts. TRO, 2024. paper

    Dominik Baumann and Thomas B. Schon.

  6. Offline goal-conditioned reinforcement learning for safety-critical tasks with recovery policy. ICRA, 2024. paper

    Chenyang Cao, Zichen Yan, Renhao Lu, Junbo Tan, and Xueqian Wang.

  7. Safe reinforcement learning on the constraint manifold: Theory and applications. arXiv, 2024. paper

    Puze Liu, Haitham Bou-Ammar, Jan Peters, and Davide Tateo.

  8. SRL-VIC: A variable stiffness-based safe reinforcement learning for contact-rich robotic tasks. RA-L, 2024. paper

    Heng Zhang, Gokhan Solak, Gustavo J. G. Lahr, and Arash Ajoudani.

  1. District cooling system control for providing operating reserve based on safe deep reinforcement learning. TPS, 2023. paper

    Peipei Yu, Hongcai Zhang, Yonghua Song, Hongxun Hui, and Ge Chen.

  2. Safe reinforcement learning for power system control: A review. arXiv, 2024. paper

    Peipei Yu, Zhenyi Wang, Hongcai Zhang, and Yonghua Song.

  3. A review of safe reinforcement learning methods for modern power systems. arXiv, 2024. paper

    Tong Su, Tong Wu, Junbo Zhao, Anna Scaglione, and Le Xie.