Reinforcement Learning For Dialogue Systems 强化学习在对话系统中的应用 论文或开源应用总结
1、End-to-End Task-Completion Neural Dialogue Systems https://arxiv.org/pdf/1703.01008
2、2016-A User Simulator for Task-Completion Dialogues https://arxiv.org/pdf/1612.05688
3、Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning ICASSP 2018 https://arxiv.org/pdf/1710.11277
4、Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning https://arxiv.org/pdf/1704.03084
5、Subgoal Discovery for Hierarchical Dialogue Policy Learning EMNLP 2018 https://arxiv.org/pdf/1804.07855
6、Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning ACL2018 https://arxiv.org/pdf/1801.06176
7、Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning EMNLP 2018 https://arxiv.org/pdf/1808.09442
8、Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning AAAI209 https://arxiv.org/pdf/1811.07550
9、Budgeted Policy Learning for Task-Oriented Dialogue Systems ACL 2019 https://arxiv.org/pdf/1906.00499
10、2019-Emnlp-Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
11、Su P H, Budzianowski P, Ultes S, et al. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management[J]. arXiv preprint arXiv:1707.00130, 2017.
12、Weisz G, Budzianowski P, Su P H, et al. Sample efficient deep reinforcement learning for dialogue systems with large action spaces
13、He J, Chen J, He X, et al. Deep reinforcement learning with a natural language action space[J]. arXiv preprint arXiv:1511.04636, 2015.
14、Casanueva I, Budzianowski P, Su P H, et al. Feudal reinforcement learning for dialogue management in large domains[J]. arXiv preprint arXiv:1803.03232, 2018.
15、 Abel D, Salvatier J, Stuhlmüller A, et al. Agent-agnostic human-in-the-loop reinforcement learning[J]. arXiv preprint arXiv:1701.04079, 2017.
16、 Ross S, Gordon G, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the fourteenth international conference on artificial intelligence and statistics. 2011: 627-635.
17、 Chen L, Zhou X, Chang C, et al. Agent-aware dropout dqn for safe and efficient on-line dialogue policy learning[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 2454-2464.
20、Dialogue Environments are Different from Games: Investigating Variants of Deep Q-Networks for Dialogue Policy
微软开源端到端对话系统框架Convlab:https://github.com/ConvLab/ConvLab
DQN: 2013-Playing atari with deep reinforcement learning
REINFORCE:1992-Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning
PPO:2017-Proximal policy optimization algorithms
PPO's self-imitation variant: 2018- Self-imitation learning https://arxiv.org/pdf/1707.06347.pdf
HRL:2017-Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning
A2C on policy:Asynchronous Methods for Deep Reinforcement Learning
A2C with an extra SIL loss function:Self-Imitation Learning https://arxiv.org/abs/1806.05635
SARSA
清华对话系统工具tatk中使用的用于策略的RL算法包含:https://github.com/thu-coai/tatk
Policy Gradient: Simple statistical gradient-following algorithms for connectionist reinforcement learning
PPO:Proximal policy optimization algorithms