Carl is a proposed project to create a deep learning chatbot for counseling. Counselors chat with clients using reflective listening and generate dialog training data. This training data and advanced machine learning techniques train a chatbot capable of counseling clients or assisting counselors with suggestions. Carl is an acronym for computer assisted reflective listener, and an homage to Carl Rogers, the pioneer of reflective listening.
Here is a list of related research and commercial apps:
- Turing Test by Alan Turing in 1950 Proposes imitation game testing if users can distinguish a live person from a machine.
- In 1966 Joseph Weizenbaum created Eliza, a rogerian therapist, which is based on keyword matching.
- Ellie, a virtual therapist, has been used in a study to diagnose PTSD for the US military and DARPA.
- In 2017, Stanford Psychologist Alison Darcy launched woebot, a chatbot programmed to implement cognitive behavioral therapy based on this research. Also see Machine Learning and the Profession of Medicine
Other Commercial apps: Replika (open source cake chat), X2AI Tess - Xiaoice is a conversational Chinese Microsoft Chatbot. Microsoft also produced, but shut down English chatbots: Tay and Zo.
- Neural Conversation Model (2015)
- GPT-2 Open AI large transformer-based language model. Generates realistic and coherent synthetic text. GPT-2 code / GPT-2 blog GPT-2 model
- Albert (previously Bert ) - Google's transformer-based language model.
- Adversarial Learning for Neural Dialogue Generation
- Building Chatbot with emotion -Towards a Human-like Open-Domain Chatbot blog - 2020 - Google Brain presents Meena, a chatbot trained on 40B words on 2.6B parameter neural network.
- How to help someone feel better: NLP for mental health - NLP research and analysis of effective counseling messages and techniques, includes a data source from crisis text line with over 80,000 counseling sessions. Data set can be obtained through external partnership.
- Daily Dialog: A Manually Labelled Multi-turn Dialogue Dataset 2017
This proposal explains how reflective listening solves the inconsistent voice problem that occurs in many chatbots. The inconsistent voice problem is conflicting responses often in response to personal or situational questions. Reflective listening techniques avoid the inconsistent voice problem, largely by deflecting personal questions. This document is also a primer on state of the art deep learning algorithms for dialog generation. It concludes with proposals for other potential issues including ethics, privacy and data collection.
Carl Rogers was a popular American psychologist and he pioneered reflective listening. Carl Rogers asserted if he could provide a relationship with empathy, non-judgement, positive regard, and authenticity it would enable self actualization in the client. Self actualization is living up to your potential. It means thinking less rigidly, seeing the world in less black and white terms. It means being more realistic, more integrated and effective. It means having higher frustration tolerance, and being more mature. It means being less defensive, and more adaptive. It means being less repressed in feeling, and more accepting. Carl Rogers was very optimistic about human nature and he believed people have "a basically positive direction." Rogers said "The curious paradox is that when I accept myself just as I am, then I can change."
(For more information see this video where he describes and demonstrates his techniques or Carl Roger's book On Becoming a Person)
Because machine learning requires "big data," it will require many different counselors to create enough data. Many AI dialog research papers, inclucding Neural Convesation Model, use social media and media subtitle sources as training data. Some of these sources are in excess of one billion words, but suffer from an inconsistent voice. It may be possible to train many counselors using reflective listening to respond consistently. Reflective listening facilitates a consistent voice, by not disclosing personal information.
Here is an example of inconsistent voice problem from Neural Conversation Model:
Human: what is your job ?
Machine: i ’m a lawyer .
Human: what do you do ?
Machine: i ’m a doctor .
The techniques of reflective listening include asking open ended questions. Open ended questions are questions that can’t be answered with a yes or no response. They serve to move the conversation forward. Examples of open ended questions are:
- “What is on your mind?”
- “How did you feel about that?”
- "What was it like the first time you felt that way?"
Reflective listening is the process of expressing back feelings and thoughts. The counselor may say “You seem frustrated” or “You seemed encouraged by that news.” This is an emotional classification problem. Counselors also summarize thoughts, often at higher levels of abstraction. For instance, if a client complains about an incompetent male teacher and his father, the counselor might respond "You are seeking a strong male role model."
Reflective listening solves the inconsistent voice problem by being non-directive, meaning counselors don’t lead the conversation. In reflective listening, you typically track the client's mood and emotion. As an example, counselors don't give adive. If a client asks for direct advice, counselors will deflect by reflection or open ended questions.
Client: “What should I do?”
Counselor: “What do you think your options are?”
Client: “What should I do?”
Counselor: "You wish I could give you some advice about this."
To bring this all together, here is an example of potential responses that maintain a consitent voice from examples in Neural Conversation Model.
Human: what is your job?
Machine: You're curious about my qualifications.
Human: what do you do?
Machine: What is important to you about knowing what I do?
Counselors using reflective listening don’t disclose personal information. If a client asks, “Do you have kids?,” a counselor might respond with an open ended question such as “What’s the reason you ask?” or to summarize the underlying emotion “You are concerned that someone who doesn't have kids won't be able to relate to you.” Because counselors' don't disclose personal information one counselor’s response shouldn’t conflict with another’s. If you disclosed personal information such as how many children you have it would conflict with other counselors and not serve as consistent training data. This is a key concept that mitigates the consistent voice problem posed by many large dialog corpora.
Deep learning has made many advancements including speech recognition, object recognition, and machine translation. Companies are using deep learning for many language--related tasks such as to improving searches, image captioning, and creating question--answering agents and assistants, such as Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant. Many of these Deep learning techniques are applicable to dialog systems.
A neural conversation model uses the seq2seq framework which was originally designed to do machine translation. It takes a single sentence like "What is the purpose of life?" and "translates" it to a single sentence response like "to serve the greater good." This research powers the Smart Reply feature for the Google Inbox app that presents users with several possible replies to emails. The Seq2seq framework has also been used to successfully rewrite Google's production translation services. (see Google blog post and New York Times article and Jeff Dean give high level explanation). The seq2seq model is used in Google's translation in combination with a beam search. The seq2seq model predicts one word at a time, but the beam search buffers the most probable phrases and chooses the most probable overall phrase.
Update: As of 2019, State of the art translation systems, such as Google Transformer and GPT2, are using models that use attention to consider relationships between all words in a sentence rather than each word in a sequence.
A common problem for generative models is repeating generic phrases such as "I don't know." Diversity in responses is the topic of Deep Reinforcement Learning for Dialogue Generation and A Diversity-Promoting Objective Function for Neural Conversation Models.
Reflective listening sometimes requires responding with the user's original statements. This is topic of the paper Incorporating Copying Mechanism in Sequence-to-Sequence Learning Which gives the following examples:
Statement: Hello Jack, my name is Chandralekha.
Response: Nice to meet you, Chandralekha.
Statement: This new guy doesn’t perform exactly as we expected.
Response: What do you mean by "doesn’t perform exactly as we expected"?
In reflective listening, you typically track the client's mood and emotion, so the requirements for memory are limited. However, one shortcoming for seq2seq lstm rnns is that they are typically programmed to only have memory for one dialog exchange (statement/response). They are limited to around 79 words in length. However, a dialog system will likely require a more sophisticated ability to focus attention on very long term memory. Memory and attention mechanisms are an active topic of research. There are several competing deep learning models like Memory Networks, Neural Turing machines, and Stack RNN. Within the context of dialog some research is attempting to create vectors for the entire conversation include Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models and Attention with Intention for a Neural Network Conversation Model.
(For more technical information on the state of the art see Deep learning for NLP)
It is uncertain if automated counseling will be effective and it may seem unnatural to automate a personal process such as counseling. An important factor in counseling is empathy and human connection. However, in a sense, machine learning is a tool to connect with the original contributors of the training data. There are also potential benefits; one study asserts that people have an increased willingness to disclose information to a computer.
Privacy is another complicated concern. It is crucial that personal data from one user is not reflected to other users. Maintaining client confidentiality is a standard ethical practice in counseling. Anonymity may be an alternative way to lessen privacy concerns. Bitcoin is an example of private data (individual financial transactions) being protected by anonymity rather than the legacy model of confidentiality in banking.
It may possible to crowdsource the task of creating this data. Carl Rogers' work was popular and accessible to non-professionals, and he thought these techniques should be used in interpersonal relationships in general. Many lay people have been trained in reflective listening techniques as volunteers at the crisis center, the national suicide prevention hotline, and 7 cups, crisis text line and others. These counselors are not required to be licensed by the state but are instead called crisis counselors or listeners. Although there are licensed counselors available for support, many of the volunteers at these places are unlicensed lay people. As illustrated by Microsoft Tay bot, screening, training and some review of training data are neccessary to crowdsourcing.
Carl would not likely be used to replace therapists, but to supplement them. It could also be a training, quality assurance or efficiency tool. In the short term, it is likely that Carl could make suggested replies, like the Google smart reply feature, and could improve over time as more data is collected and more advanced AI algorithms are developed. The potential benefits of a fully autonomous counselor are enormous. What if everyone in the world, speaking any language, at any time, had a place where they could be understood and accepted without judgment and was more capable of living up to their potential?