/carl_voice

A proposal for creating a reflective listening chatbot

Counseling and Machine Learning

Carl is a proposed project to create a deep learning chatbot for counseling. Counselors chat with clients using reflective listening and generate dialog training data. This training data and advanced machine learning techniques train a chatbot capable of counseling clients or assisting counselors with suggestions. Carl is an acronym for computer assisted reflective listener, and an homage to Carl Rogers, the pioneer of reflective listening.

Here is a list of related research and commercial apps:

Research and commercial apps

Potential Data Sources

This proposal explains how reflective listening solves the inconsistent voice problem that occurs in many chatbots. The inconsistent voice problem is conflicting responses often in response to personal or situational questions. Reflective listening techniques avoid the inconsistent voice problem, largely by deflecting personal questions. This document is also a primer on state of the art deep learning algorithms for dialog generation. It concludes with proposals for other potential issues including ethics, privacy and data collection.

Purpose of reflective listening

Carl Rogers was a popular American psychologist and he pioneered reflective listening. Carl Rogers asserted if he could provide a relationship with empathy, non-judgement, positive regard, and authenticity it would enable self actualization in the client. Self actualization is living up to your potential. It means thinking less rigidly, seeing the world in less black and white terms. It means being more realistic, more integrated and effective. It means having higher frustration tolerance, and being more mature. It means being less defensive, and more adaptive. It means being less repressed in feeling, and more accepting. Carl Rogers was very optimistic about human nature and he believed people have "a basically positive direction." Rogers said "The curious paradox is that when I accept myself just as I am, then I can change."
(For more information see this video where he describes and demonstrates his techniques or Carl Roger's book On Becoming a Person)

Consistent Voice

Because machine learning requires "big data," it will require many different counselors to create enough data. Many AI dialog research papers, inclucding Neural Convesation Model, use social media and media subtitle sources as training data. Some of these sources are in excess of one billion words, but suffer from an inconsistent voice. It may be possible to train many counselors using reflective listening to respond consistently. Reflective listening facilitates a consistent voice, by not disclosing personal information.

Here is an example of inconsistent voice problem from Neural Conversation Model:
Human: what is your job ?
Machine: i ’m a lawyer .
Human: what do you do ?
Machine: i ’m a doctor .

Open ended questions

The techniques of reflective listening include asking open ended questions. Open ended questions are questions that can’t be answered with a yes or no response. They serve to move the conversation forward. Examples of open ended questions are:

  • “What is on your mind?”
  • “How did you feel about that?”
  • "What was it like the first time you felt that way?"

Reflecting

Reflective listening is the process of expressing back feelings and thoughts. The counselor may say “You seem frustrated” or “You seemed encouraged by that news.” This is an emotional classification problem. Counselors also summarize thoughts, often at higher levels of abstraction. For instance, if a client complains about an incompetent male teacher and his father, the counselor might respond "You are seeking a strong male role model."

Non-directive

Reflective listening solves the inconsistent voice problem by being non-directive, meaning counselors don’t lead the conversation. In reflective listening, you typically track the client's mood and emotion. As an example, counselors don't give adive. If a client asks for direct advice, counselors will deflect by reflection or open ended questions.

Client: “What should I do?”
Counselor: “What do you think your options are?”

Client: “What should I do?”
Counselor: "You wish I could give you some advice about this."

To bring this all together, here is an example of potential responses that maintain a consitent voice from examples in Neural Conversation Model.
Human: what is your job?
Machine: You're curious about my qualifications.
Human: what do you do?
Machine: What is important to you about knowing what I do?

Non-disclosure

Counselors using reflective listening don’t disclose personal information. If a client asks, “Do you have kids?,” a counselor might respond with an open ended question such as “What’s the reason you ask?” or to summarize the underlying emotion “You are concerned that someone who doesn't have kids won't be able to relate to you.” Because counselors' don't disclose personal information one counselor’s response shouldn’t conflict with another’s. If you disclosed personal information such as how many children you have it would conflict with other counselors and not serve as consistent training data. This is a key concept that mitigates the consistent voice problem posed by many large dialog corpora.

Examples of reflective listening

reflective listening

Deep Learning

Deep learning has made many advancements including speech recognition, object recognition, and machine translation. Companies are using deep learning for many language--related tasks such as to improving searches, image captioning, and creating question--answering agents and assistants, such as Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant. Many of these Deep learning techniques are applicable to dialog systems.

Sequence to Sequence

A neural conversation model uses the seq2seq framework which was originally designed to do machine translation. It takes a single sentence like "What is the purpose of life?" and "translates" it to a single sentence response like "to serve the greater good." This research powers the Smart Reply feature for the Google Inbox app that presents users with several possible replies to emails. The Seq2seq framework has also been used to successfully rewrite Google's production translation services. (see Google blog post and New York Times article and Jeff Dean give high level explanation). The seq2seq model is used in Google's translation in combination with a beam search. The seq2seq model predicts one word at a time, but the beam search buffers the most probable phrases and chooses the most probable overall phrase.

seq2seq dialog generation

Update: As of 2019, State of the art translation systems, such as Google Transformer and GPT2, are using models that use attention to consider relationships between all words in a sentence rather than each word in a sequence.

Diversity

A common problem for generative models is repeating generic phrases such as "I don't know." Diversity in responses is the topic of Deep Reinforcement Learning for Dialogue Generation and A Diversity-Promoting Objective Function for Neural Conversation Models.

Copying

Reflective listening sometimes requires responding with the user's original statements. This is topic of the paper Incorporating Copying Mechanism in Sequence-to-Sequence Learning Which gives the following examples:

Statement: Hello Jack, my name is Chandralekha.
Response: Nice to meet you, Chandralekha.

Statement: This new guy doesn’t perform exactly as we expected.
Response: What do you mean by "doesn’t perform exactly as we expected"?

Memory and Intention

In reflective listening, you typically track the client's mood and emotion, so the requirements for memory are limited. However, one shortcoming for seq2seq lstm rnns is that they are typically programmed to only have memory for one dialog exchange (statement/response). They are limited to around 79 words in length. However, a dialog system will likely require a more sophisticated ability to focus attention on very long term memory. Memory and attention mechanisms are an active topic of research. There are several competing deep learning models like Memory Networks, Neural Turing machines, and Stack RNN. Within the context of dialog some research is attempting to create vectors for the entire conversation include Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models and Attention with Intention for a Neural Network Conversation Model.

(For more technical information on the state of the art see Deep learning for NLP)

Other issues

It is uncertain if automated counseling will be effective and it may seem unnatural to automate a personal process such as counseling. An important factor in counseling is empathy and human connection. However, in a sense, machine learning is a tool to connect with the original contributors of the training data. There are also potential benefits; one study asserts that people have an increased willingness to disclose information to a computer.

Privacy is another complicated concern. It is crucial that personal data from one user is not reflected to other users. Maintaining client confidentiality is a standard ethical practice in counseling. Anonymity may be an alternative way to lessen privacy concerns. Bitcoin is an example of private data (individual financial transactions) being protected by anonymity rather than the legacy model of confidentiality in banking.

It may possible to crowdsource the task of creating this data. Carl Rogers' work was popular and accessible to non-professionals, and he thought these techniques should be used in interpersonal relationships in general. Many lay people have been trained in reflective listening techniques as volunteers at the crisis center, the national suicide prevention hotline, and 7 cups, crisis text line and others. These counselors are not required to be licensed by the state but are instead called crisis counselors or listeners. Although there are licensed counselors available for support, many of the volunteers at these places are unlicensed lay people. As illustrated by Microsoft Tay bot, screening, training and some review of training data are neccessary to crowdsourcing.

Carl would not likely be used to replace therapists, but to supplement them. It could also be a training, quality assurance or efficiency tool. In the short term, it is likely that Carl could make suggested replies, like the Google smart reply feature, and could improve over time as more data is collected and more advanced AI algorithms are developed. The potential benefits of a fully autonomous counselor are enormous. What if everyone in the world, speaking any language, at any time, had a place where they could be understood and accepted without judgment and was more capable of living up to their potential?

Please share this link and contact me at @andrewt3000