Multi-Agent Reinforcement Learning for Connected and Automated Vehicles Control: Recent Advancements and Future Prospects

A curated list of awesome Multi-Agent Reinforcement Learning in Connected and Automated Vehicles Control papers 🔥🔥🔥.

Work still in progress 🚀, we appreciate any suggestions and contributions ❤️.

How to contribute?

If you have any suggestions or find any missed papers, feel free to reach out or submit a pull request:

Use following markdown format.

*Author 1, Author 2, and Author 3.* **Paper Title.**  <ins>Conference/Journal/Preprint</ins> Year. [[pdf](link)]; [[other resources](link)].

If one preprint paper has multiple versions, please use the earliest submitted year.
Display the papers in a year descending order (the latest, the first).

Citation

Find this repository helpful? 😊

Please consider citing our paper. 👇👇👇

(Note that the current version of our survey is only a draft, and we are still working on it.) 🚀

@article{hua2023multi,
 title={Multi-Agent Reinforcement Learning for Connected and Automated Vehicles Control: Recent Advancements and Future Prospects},
 author={Hua, Min and Chen, Dong and Qi, Xinda and Jiang, Kun and Liu, Zemin Eitan and Zhou, Quan and Xu, Hongming},
 journal={arXiv preprint arXiv:2312.11084},
 year={2023}
}

🔍 Table of Contents

1. 💁🏽‍♀️ Introduction
1.1. Multi-agent System for CAV Control
1.2. Contributions of this review
2. 🗂️ Backgrounds
3. 🤖 Towards applications in connected and autonomous vehicles
4. 📚 Corpora
5. 📖 Extended Reading
- 5.1 Instruction Induction

1. 💁🏽‍♀️ Introduction

1.1 Multi-agent System for CAV Control

1.2 Contributions of this review

Why multi-agent reinforcement learning on the extent of control dimensions for connected and automated vehicles(CAVs)?

👉 Joint Policy Learning. Unlike traditional control methods or single-agent reinforcement learning, MARL allows for the simultaneous learning of multiple decision-makers (agents). This joint policy learning enables CAVs to develop cooperative strategies that are necessary for tasks like synchronized lane changing, platooning, or intersection management.
👉 Partial Observability and Information Sharing. CAVs might not be able to observe the entire traffic situation due to line-of-sight limitations, sensor range, or communication constraints. MARL algorithms can be designed to allow agents to make decisions based on partial information and to learn how to infer the missing information through interaction, which may include strategies for selective information sharing.
👉 Scalable Learning and Decentralized Execution. Provides a framework for scalable learning by allowing each agent to learn from its local observations while still considering the global outcome. This can also lead to decentralized execution, where each CAV operates based on its policy without the need for a central controller, thus making the system more robust to single points of failure.

Wang, Wenshuo, et al. "Social interactions for autonomous driving: A review and perspectives." Foundations and Trends® in Robotics 10.3-4 (2022): 198-376. [Google Scholar] [Paper]
Wang, Fei-Yue. "Artificial intelligence and intelligent transportation: Driving into the 3rd axial age with ITS" IEEE Intelligent transportation systems magazine 9.4 (2017): 6-9. [Google Scholar] [Paper]
Liu, Wei, et al. "A systematic survey of control techniques and applications in connected and automated vehicles." IEEE Internet of Things Journal (2023). [Google Scholar] [Paper]
Schmidt, Lukas M., et al. "An introduction to multi-agent reinforcement learning and review of its application to autonomous mobility." 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022. [Google Scholar] [Paper]
Dinneweth, Joris, et al. "Multi-agent reinforcement learning for autonomous vehicles: A survey." Autonomous Intelligent Systems 2.1 (2022): 27. [Google Scholar] [Paper]

2. 🎓 Backgrounds

2.1 Preliminaries of Reinforcement Learning (RL)

Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artificial intelligence research 4 (1996): 237-285. [Google Scholar] [Paper]
Arulkumaran, Kai, et al. "Deep reinforcement learning: A brief survey." IEEE Signal Processing Magazine 34.6 (2017): 26-38. [Google Scholar] [Paper]
Wang, Hao-nan, et al. "Deep reinforcement learning: a survey." Frontiers of Information Technology & Electronic Engineering 21.12 (2020): 1726-1744. [Google Scholar] [Paper]
Reinforcement Learning: An Introduction. [Code]
Li, Yuxi. "Deep reinforcement learning: An overview." arXiv preprint arXiv:1701.07274 (2017). [Google Scholar] [Paper]
Varuna Jayasiri, Nipun Wijerathne. labml.ai Annotated Paper Implementations. [Code]

2.1.1 Deep Q-Learning

Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). [Google Scholar] [Paper] [Code]
Hausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partially observable mdps." 2015 aaai fall symposium series. 2015. [Google Scholar] [Paper] [Code]
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 20 [Google Scholar] [Paper] [Code]
Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 30. No. 1. 2016. [Google Scholar] [Paper] [Code]
Schaul, Tom, et al. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015). [Google Scholar] [Paper] [Code]
Hessel, Matteo, et al. "Rainbow: Combining improvements in deep reinforcement learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 32. No. 1. 2018. [Google Scholar] [Paper] [Code]

2.1.2 Policy Gradient

Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017). [Google Scholar] [Paper] [Code]
Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016. [Google Scholar] [Paper] [Code]
Schulman, John, et al. "Trust region policy optimization." International conference on machine learning. PMLR, 2015. [Google Scholar] [Paper] [Code]
Schulman, John, et al. "High-dimensional continuous control using generalized advantage estimation." arXiv preprint arXiv:1506.02438 (2015). [Google Scholar] [Paper] [Code]
Heess, Nicolas, et al. "Emergence of locomotion behaviours in rich environments." arXiv preprint arXiv:1707.02286 (2017). [Google Scholar] [Paper] [Code]
Wu, Yuhuai, et al. "Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation." Advances in neural information processing systems 30 (2017). [Google Scholar] [Paper] [Code]

2.1.3 Actor-critic Network

Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." International conference on machine learning. PMLR, 2018. [Google Scholar] [Paper] [Code]
Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). [Google Scholar] [Paper] [Code]
Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, 2018. [Google Scholar] [Paper] [Code]
Wang, Ziyu, et al. "Sample efficient actor-critic with experience replay." arXiv preprint arXiv:1611.01224 (2016). [Google Scholar] [Paper] [Code]
Silver, David, et al. "Deterministic policy gradient algorithms." International conference on machine learning. Pmlr, 2014. [Google Scholar] [Paper] [Code]

2.2 Multi-Agent Reinforcement Learning (MARL)

Nguyen, Thanh Thi, Ngoc Duy Nguyen, and Saeid Nahavandi. "Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications." IEEE transactions on cybernetics 50.9 (2020): 3826-3839. [Google Scholar] [Paper]
Gronauer, Sven, and Klaus Diepold. "Multi-agent deep reinforcement learning: a survey." Artificial Intelligence Review (2022): 1-49. [Google Scholar] [Paper]
Zhang, Kaiqing, Zhuoran Yang, and Tamer Başar. "Multi-agent reinforcement learning: A selective overview of theories and algorithms." Handbook of reinforcement learning and control (2021): 321-384. [Google Scholar] [Paper]
Tan, Ming. "Multi-agent reinforcement learning: Independent vs. cooperative agents." Proceedings of the tenth international conference on machine learning. 1993. [Google Scholar] [Paper]
Nguyen, Thanh Thi, Ngoc Duy Nguyen, and Saeid Nahavandi. "Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications." IEEE transactions on cybernetics 50.9 (2020): 3826-3839. [Google Scholar] [Paper]
Chen, Hao. "Multi-Agent Reinforcement Learning Papers with Code." [Code]

2.3 Training and Execution Strategies in MARL

Zhao, Jian, et al. "Ctds: Centralized teacher with decentralized student for multi-agent reinforcement learning." IEEE Transactions on Games (2022). [Google Scholar] [Paper]
Yao, Xinghu, et al. "Smix (λ): Enhancing centralized value functions for cooperative multiagent reinforcement learning." IEEE Transactions on Neural Networks and Learning Systems (2021). [Google Scholar] [Paper]
Zhang, Kaiqing, et al. "Fully decentralized multi-agent reinforcement learning with networked agents." International Conference on Machine Learning. PMLR, 2018. [Google Scholar] [Paper]
Sharma, Piyush K., et al. "Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training." Artificial intelligence and machine learning for multi-domain operations applications III. Vol. 11746. SPIE, 2021. [Google Scholar] [Paper]
Zhang, Kaiqing, Zhuoran Yang, and Tamer Başar. "Decentralized multi-agent reinforcement learning with networked agents: Recent advances." Frontiers of Information Technology & Electronic Engineering 22.6 (2021): 802-814. [Google Scholar] [Paper]
Kraemer, Landon, and Bikramjit Banerjee. "Multi-agent reinforcement learning as a rehearsal for decentralized planning." Neurocomputing 190 (2016): 82-94. [Google Scholar] [Paper]

2.4 MARL Algorithm Variants

2.4.1 Value function decomposition

Son, Kyunghwan, et al. "Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning." International conference on machine learning. PMLR, 2019. [Google Scholar] [Paper] [Code]
Rashid, Tabish, et al. "Monotonic value function factorisation for deep multi-agent reinforcement learning." The Journal of Machine Learning Research 21.1 (2020): 7234-7284. [Google Scholar] [Paper] [Code]
Sunehag, Peter, et al. "Value-decomposition networks for cooperative multi-agent learning." arXiv preprint arXiv:1706.05296 (2017). [Google Scholar] [Paper] [Code]
Rashid, Tabish, et al. "Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning." Advances in neural information processing systems 33 (2020): 10199-10210. [Google Scholar] [Paper] [Code]
Liu, Shanqi, et al. "Learning Multi-Agent Cooperation via Considering Actions of Teammates." IEEE Transactions on Neural Networks and Learning Systems (2023). [Google Scholar] [Paper]
Zhang, Yuanxin, Huimin Ma, and Yu Wang. "Avd-net: Attention value decomposition network for deep multi-agent reinforcement learning." 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021. [Google Scholar] [Paper]

2.4.2 Learning to Communicate

Sukhbaatar, Sainbayar, and Rob Fergus. "Learning multiagent communication with backpropagation." Advances in neural information processing systems 29 (2016). [Google Scholar] [Paper] [Code]
Foerster, Jakob, et al. "Learning to communicate with deep multi-agent reinforcement learning." Advances in neural information processing systems 29 (2016). [Google Scholar] [Paper]
Peng, Peng, et al. "Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games." arXiv preprint arXiv:1703.10069 (2017). [Google Scholar] [Paper] [Code]
Singh, Amanpreet, Tushar Jain, and Sainbayar Sukhbaatar. "Learning when to communicate at scale in multiagent cooperative and competitive tasks." arXiv preprint arXiv:1812.09755 (2018). [Google Scholar] [Paper] [Code]
Foerster, Jakob, et al. "Learning to communicate with deep multi-agent reinforcement learning." Advances in neural information processing systems 29 (2016). [Google Scholar] [Paper] [Code]
Chu, Tianshu, Sandeep Chinchali, and Sachin Katti. "Multi-agent reinforcement learning for networked system control." arXiv preprint arXiv:2004.01339 (2020). [Google Scholar] [Paper] [Code]

2.4.3 Hierarchical structure

Pateria, Shubham, et al. "Hierarchical reinforcement learning: A comprehensive survey." ACM Computing Surveys (CSUR) 54.5 (2021): 1-35. [Google Scholar] [Paper]
Ahilan, Sanjeevan, and Peter Dayan. "Feudal multi-agent hierarchies for cooperative reinforcement learning." arXiv preprint arXiv:1901.08492 (2019). [Google Scholar] [Paper]
Tang, Hongyao, et al. "Hierarchical deep multiagent reinforcement learning with temporal abstraction." arXiv preprint arXiv:1809.09332 (2018). [Google Scholar] [Paper]
Xu, Zhiwei, et al. "HAVEN: hierarchical cooperative multi-agent reinforcement learning with dual coordination mechanism." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 10. 2023. [Google Scholar] [Paper]
Yang, Jiachen, Igor Borovikov, and Hongyuan Zha. "Hierarchical cooperative multi-agent reinforcement learning with skill discovery." arXiv preprint arXiv:1912.03558 (2019). [Google Scholar] [Paper]

2.4.4 Causal inference

Grimbly, St John, Jonathan Shock, and Arnu Pretorius. "Causal multi-agent reinforcement learning: Review and open problems." arXiv preprint arXiv:2111.06721 (2021). [Google Scholar] [Paper]
Jaques, Natasha, et al. "Intrinsic social motivation via causal influence in multi-agent RL." (2018). [Google Scholar] [Paper] [Code]
Wang, Han, Yang Yu, and Yuan Jiang. "Fully Decentralized Multiagent Communication via Causal Inference." IEEE Transactions on Neural Networks and Learning Systems (2022). [Google Scholar] [Paper]
Liu, Boyin, et al. "Lazy Agents: A New Perspective on Solving Sparse Reward Problem in Multi-agent Reinforcement Learning." (2023). [Google Scholar] [Paper]
Pina, Rafael, Varuna De Silva, and Corentin Artaud. "Discovering Causality for Efficient Cooperation in Multi-Agent Environments." arXiv preprint arXiv:2306.11846 (2023). [Google Scholar] [Paper]
Pearl, Judea."Theoretical impediments to machine learning with seven sparks from the causal revolution." arXiv preprint arXiv:1801.04016 (2018). [Google Scholar] [Paper]

3. 🤖 Towards applications in connected and autonomous vehicles

In this section, we will comprehensively explore the recent strides made in the utilization of MARL within CAV applications. Our examination will be structured according to various dimensions of cooperation, each correlated with a specific number of control components.

One-dimensional cooperation corresponds to scenarios involving control along a single control direction, such as either longitudinal control or lateral control.

Two-dimensional cooperation extends the scope to include both longitudinal and lateral control components, reflecting the increased complexity in the coordination and decision-making of CAVs.

Three-dimensional cooperation further augments the challenges and opportunities by incorporating additional constraints, such as time limits, which encompass aspects like traffic light control and on-ramp merging.

3.1 One-dimensional Cooperation

Parvini, Mohammad, et al. "AoI-aware resource allocation for platoon-based C-V2X networks via multi-agent multi-task reinforcement learning." IEEE Transactions on Vehicular Technology (2023). [Google Scholar] [Paper]
He, Lv. "Multi-vehicle Platoon Overtaking Using NoisyNet Multi-Agent Deep Q-Learning Network." arXiv preprint arXiv:2303.02583 (2023). [Google Scholar] [Paper]
Xu, Yuanyuan, et al. "Deep Reinforcement Learning for Multi-Objective Resource Allocation in Multi-Platoon Cooperative Vehicular Networks." IEEE Transactions on Wireless Communications (2023). [Google Scholar] [Paper]
Beaver, Logan E. "Constraint-driven optimal control of multiagent systems: A highway platooning case study." IEEE Control Systems Letters 6 (2021): 1754-1759. [Google Scholar] [Paper]
Li, Yongfu, et al. "Consensus-based cooperative control for multi-platoon under the connected vehicles environment." IEEE Transactions on Intelligent Transportation Systems 20.6 (2018): 2220-2229. [Google Scholar] [Paper]
Guo, Xiang-Gui, et al. Multi-agent systems: platoon control and non-fragile quantized consensus. CRC Press, 2019. [Google Scholar] [Paper]
Li, Yongfu, et al. "Platoon control of connected multi-vehicle systems under V2X communications: Design and experiments." IEEE Transactions on Intelligent Transportation Systems 21.5 (2019): 1891-1902. [Google Scholar] [Paper]

3.2 Two-dimensional Cooperation

Zhou, Wei, et al. "Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic." Autonomous Intelligent Systems 2.1 (2022): 5. [Google Scholar] [Paper]
Chen, Sikai, et al. "Graph neural network and reinforcement learning for multi‐agent cooperative control of connected autonomous vehicles." Computer‐Aided Civil and Infrastructure Engineering 36.7 (2021): 838-857. [Google Scholar] [Paper]
Candela, Eduardo, et al. "Transferring multi-agent reinforcement learning policies for autonomous driving using sim-to-real." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022. [Google Scholar] [Paper]
Le, Nguyen-Tuan-Thanh. "Multi-agent reinforcement learning for traffic congestion on one-way multi-lane highways." Journal of Information and Telecommunication (2023): 1-15. [Google Scholar] [Paper]
Chen, Siyuan, et al. "Multi-Agent Reinforcement Learning-Based Decision Making for Twin-Vehicles Cooperative Driving in Stochastic Dynamic Highway Environments." IEEE Transactions on Vehicular Technology (2023). [Google Scholar] [Paper]
Zong, Fang, et al. "Dynamic lane changing trajectory planning for CAV: A multi-agent model with path preplanning." Transportmetrica B: transport dynamics 10.1 (2022): 266-292. [Google Scholar] [Paper]
Zhang, Jiawei, et al. "Multi-agent DRL-based lane change with right-of-way collaboration awareness." IEEE Transactions on Intelligent Transportation Systems 24.1 (2022): 854-869. [Google Scholar] [Paper]

3.3 Three-dimensional Cooperation

3.3.1 Traffic signal control

Antonio, Guillen-Perez, and Cano Maria-Dolores. "Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow's intersections." IEEE Transactions on Vehicular Technology 71.7 (2022): 7033-7043. [Google Scholar] [Paper]
Liu, Junjia, et al. "Learning scalable multi-agent coordination by spatial differentiation for traffic signal control." Engineering Applications of Artificial Intelligence 100 (2021): 104165. [Google Scholar] [Paper]
Liu, Dongjiang, and Leixiao Li. "A traffic light control method based on multi-agent deep reinforcement learning algorithm." Scientific Reports 13.1 (2023): 9396. [Google Scholar] [Paper]
Wang, Yanan, et al. "STMARL: A spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control." IEEE Transactions on Mobile Computing 21.6 (2020): 2228-2242. [Google Scholar] [Paper]
Ma, Jinming, and Feng Wu. "Feudal multi-agent deep reinforcement learning for traffic signal control." Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 2020. [Google Scholar] [Paper]
Wang, Tong, Jiahua Cao, and Azhar Hussain. "Adaptive Traffic Signal Control for large-scale scenario with Cooperative Group-based Multi-agent reinforcement learning." Transportation research part C: emerging technologies 125 (2021): 103046. [Google Scholar] [Paper]
Yang, Shantian, et al. "IHG-MA: Inductive heterogeneous graph multi-agent reinforcement learning for multi-intersection traffic signal control." Neural networks 139 (2021): 265-277. [Google Scholar] [Paper]

3.3.2 On-ramps merging

Chen, Dong, et al. "Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic." IEEE Transactions on Intelligent Transportation Systems (2023). [Google Scholar] [Paper]
Chandra, Rohan, and Dinesh Manocha. "Gameplan: Game-theoretic multi-agent planning with human drivers at intersections, roundabouts, and merging." IEEE Robotics and Automation Letters 7.2 (2022): 2676-2683. [Google Scholar] [Paper]
Hu, Yeping, et al. "Interaction-aware decision making with adaptive strategies under merging scenarios." 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019. [Google Scholar] [Paper]
Schester, Larry, and Luis E. Ortiz. "Longitudinal position control for highway on-ramp merging: A multi-agent approach to automated driving." 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019. [Google Scholar] [Paper]
Li, Lin, et al. "Nash double Q-based multi-agent deep reinforcement learning for interactive merging strategy in mixed traffic." Expert Systems with Applications 237 (2024): 121458. [Google Scholar] [Paper]
Zhang, Xinfeng, et al. "High-Speed Ramp Merging Behavior Decision for Autonomous Vehicles Based on Multi-Agent Reinforcement Learning." IEEE Internet of Things Journal (2023). [Google Scholar] [Paper]
Zine, el abidine Kherroubi, Samir Aknine, and Rebiha Bacha. "Novel decision-making strategy for connected and autonomous vehicles in highway on-ramp merging." IEEE Transactions on Intelligent Transportation Systems 23.8 (2021): 12490-12502. [Google Scholar] [Paper]

3.3.3 Unsignalized intersections

Spatharis, Christos, and Konstantinos Blekas. "Multiagent reinforcement learning for autonomous driving in traffic zones with unsignalized intersections." Journal of Intelligent Transportation Systems (2022): 1-17. [Google Scholar] [Paper]
Bautista-Montesano, Rolando, et al. "Autonomous navigation at unsignalized intersections: A coupled reinforcement learning and model predictive control approach." Transportation research part C: emerging technologies 139 (2022): 103662. [Google Scholar] [Paper]
Shu, Hong, et al. "Driving tasks transfer using deep reinforcement learning for decision-making of autonomous vehicles in unsignalized intersection." IEEE Transactions on Vehicular Technology 71.1 (2021): 41-52. [Google Scholar] [Paper]
Peng, Bile, et al. "Connected autonomous vehicles for improving mixed traffic efficiency in unsignalized intersections with deep reinforcement learning." Communications in Transportation Research 1 (2021): 100017. [Google Scholar] [Paper]
Guo, Zihan, et al. "Coordination for connected and automated vehicles at non-signalized intersections: A value decomposition-based multiagent deep reinforcement learning approach." IEEE Transactions on Vehicular Technology 72.3 (2022): 3025-3034. [Google Scholar] [Paper]
Geng, Maosi, et al. "Multimodal Vehicular Trajectory Prediction With Inverse Reinforcement Learning and Risk Aversion at Urban Unsignalized Intersections." IEEE Transactions on Intelligent Transportation Systems (2023). [Google Scholar] [Paper]
Xu, Yunting, et al. "Leveraging multiagent learning for automated vehicles scheduling at nonsignalized intersections." IEEE Internet of Things Journal 8.14 (2021): 11427-11439. [Google Scholar] [Paper]

3.3.4 Simulation Platforms

Dosovitskiy, Alexey, et al. "CARLA: An open urban driving simulator." Conference on robot learning. PMLR, 2017. [Google Scholar] [Paper] [Code]
Zhang, Huichu, et al. "Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario." The world wide web conference. 2019. [Google Scholar] [Paper] [Code]
Zhou, Ming, et al. "Smarts: Scalable multi-agent reinforcement learning training school for autonomous driving." arXiv preprint arXiv:2010.09776 (2020). [Google Scholar] [Paper] [Code]
Li, Quanyi, et al. "Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning." IEEE transactions on pattern analysis and machine intelligence 45.3 (2022): 3461-3475. [Google Scholar] [Paper] [Code]
Lopez, Pablo Alvarez, et al. "Microscopic traffic simulation using sumo." 2018 21st international conference on intelligent transportation systems (ITSC). IEEE, 2018. [Google Scholar] [Paper] [Code]
Leurent, Edouard. "An environment for autonomous driving decision-making." (2018). [Google Scholar] [Code]

huahuaedi/MARL_in_CAV_control_review