
Disagreement can make conversation difficult. In today’s world, issues of importance often become incredibly polarizing. Given the pandemic, certain conversations –withanti-maskers, for example –areall the more necessary. Taking the proper precautions could prevent tens of thousandsof deaths innationwide. We seek to develop a chatbot that learns to have these difficult conversations, actively learning the best policy to convince a COVID-19 skeptic to take proper safety precautions regarding the pandemic.


First, we created a Facebook chat using Facebook API. Then, we took three reinforcement learning algorithms: Q-learning, Sarsa and Value Iteration, and applied them on synthetic data to train the chatbot.
To sample the data, we designed an algorithm that randomly assigned a multinomial distribution to the set of possible replies to a message we sent.

  1. Sample from this distribution, given a state and an action
  2. The sampled reply from the user is the state prime

State: User response
Actions: list of possible next questions to ask
Reward: 100 if user agrees to take precautions, else -100
Next state: User response to the question picked from Actions


We found that after training, Q-Learning and Sarsa underperformed Value Iteration algorithm. This occurred due to the small size of the data set. The small size leads to Q-Learning and Sarsa producing policies that have unrewarding cycles as learning doesn’t adequately cover the state space.


Host server locally

  1. cd into the repository directory
  2. Start the server with python3 in the terminal.
  3. Open a new terminal window. Run ./ngrok http 5000 in the terminal. This points domains to the local server.
  4. Go to the Facebook app.
  5. On Facebook: Generate new token. Paste in and edit the URL for the webhook.