Disagreement can make conversation difficult. In today’s world, issues of importance often become incredibly polarizing. Given the pandemic, certain conversations –withanti-maskers, for example –areall the more necessary. Taking the proper precautions could prevent tens of thousandsof deaths innationwide. We seek to develop a chatbot that learns to have these difficult conversations, actively learning the best policy to convince a COVID-19 skeptic to take proper safety precautions regarding the pandemic.
First, we created a Facebook chat using Facebook API. Then, we took three reinforcement learning algorithms: Q-learning, Sarsa and Value Iteration, and applied them on synthetic data to train the chatbot.
To sample the data, we designed an algorithm that randomly assigned a multinomial distribution to the set of possible replies to a message we sent.
- Sample from this distribution, given a state and an action
- The sampled reply from the user is the state prime
State: User response
Actions: list of possible next questions to ask
Reward: 100 if user agrees to take precautions, else -100
Next state: User response to the question picked from Actions
We found that after training, Q-Learning and Sarsa underperformed Value Iteration algorithm. This occurred due to the small size of the data set. The small size leads to Q-Learning and Sarsa producing policies that have unrewarding cycles as learning doesn’t adequately cover the state space.
https://www.youtube.com/watch?v=W9qSnNPnw2M&ab_channel=MeileeZhou
cd
into the repository directory- Start the server with
python3 application.py
in the terminal. - Open a new terminal window. Run
./ngrok http 5000
in the terminal. This points domains to the local server. - Go to the Facebook app.
- On Facebook: Generate new token. Paste in messenger_webhook.py and edit the URL for the webhook.