/CPsyCoun

[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Primary LanguageJupyter NotebookCreative Commons Attribution 4.0 InternationalCC-BY-4.0

CPsyCoun

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

CPsyCounX github CPsyCounD CPsyCounR

🔥News

  • [Sep. 2024]: Our psychological counseling report dataset CPsyCounR is now available upon reasonable request after signing the Privacy Data Protection Agreement.
  • [Jul. 2024]: Paper presentation work: Report | Long talk interviewed by shanghai AI Lab | Short talk interviewed by AI TIME
  • [Jul. 2024]: We collaborate with EmoLLM team to launch EmoLLM V3.0, which was full fine-tuned based on the dataset CPsyCounD and the model InternLM2.5-7B-Chat. Model weights: OpenXLab, ModelScope. WebDemo: OpenXLab demo.
  • [May. 2024]: Our paper has released on arXiv , check it out!
  • [May. 2024]: CPsyCoun has been accepted to 2024 ACL Findings!
  • [Apr. 2024]: CPsyCoun has been used in EmoLLM img, welcome!

Method

CPsyCoun Framework

The CPsyCoun framework consists of two parts - Data Generation and Automatic Evaluation.

Framework

Dialogue Reconstruction

The method Memo2Demo consists of two parts - Memo Conversion and Demo Generation, in order to generate high-quality psychological consultation dialogue from counseling reports.

Memo2Demo

Counseling Report

Acoording to the China’s National Class II Psychological Counselor Examination and other psychological counseling literature, the counseling report is normalized into six parts: Title, Type, Method, Case Brief, Consultation Process and Experience Thoughts.

  • An example of counseling report

Counseling_Report

CPsyCounD

The high-quality multi-turn dialogue dataset, which has a total of 3,134 multi-turn consultation dialogues.

Evaluation Framework

Evaluation Metrics

  • Comprehensiveness
    • The client’s situation and the degree to which psychological problems are reflected in the dialogues.
  • Professionalism
    • The professionalism of the psychological counselor during the dialogues.
  • Authenticity
    • The degree of authenticity between the client and the counselor in the dialogues.
  • Safety
    • The degree of privacy protection of clients.

Score Criterion

  • The score criterion of each evaluation metric

Score Criterion

Turn-Based Dialogue Evaluation

The approach to effectively evaluate multi-turn consultation dialogues.

Denote a $m$-turn dialogue as a set of paired elements ${(q_i,r_i)|i=1, 2, ..., m}$, where each $q_i$ represents a query from the client, and each corresponding $r_i$ represents the counselor's reply. We first split it into $m$ single-turn dialogue, then prompt the model with query together with its dialogue history in each single-turn dialogue, resulting in the corresponding single-turn response:

math_1

where $h_i={(q_j, r_j)|j=1, 2, ..., i-1}$ signifies the dialogue history before $i$-th turn, and $f_{\mathit{LLM}}(\cdot)$ denotes the inference process of LLMs.

Then, we employ LLM to assess these responses, utilizing the evaluation metrics. The model to assign an evaluation score $\hat{s}_i$ for a single-turn response $\hat{r}_i$. Then we average them to yield the total evaluation score of the current $m$-turn dialogue:

math_2

  • For more details, please refer to the Code.

CPsyCounE

The general multi-turn dialogue evaluation dataset, which has nine topics.

  • For more details, please refer to the CPsyCounE.

Experiments

Intrinsic Evaluation

Role-play VS Memo2Demo
  • Statistics of generated dialogues

Statistics

  • The results of intrinsic evaluation

Intrinsic evaluation

Extrinsic Evaluation

CPsyCounX

We further fine-tune InternLM2-7B-Chat on CPsyCounD. CPsyCounX is fine-tuning for 9 epochs with the batch size set to 448, and the learning rate set to ${1\times10^{-6}}$. During fine-tuning, we adopt the InternLM2-style template to concatenate queries and responses within the multi-turn dialogue.

  • For more details, please refer to the Code.
  • CPsyCounX is open-sourced at HuggingFace.
Results
  • The average results of extrinsic evaluation

Extrinsic evaluation

  • Radar plot of detailed scores of CPsyCounX and other baselines

Radar plot

  • The full results of extrinsic evaluation

Full results

Citation

If you find our work helpful in your research, please cite the following paper:

@inproceedings{zhang-etal-2024-cpsycoun,
    title="{CP}sy{C}oun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for {C}hinese Psychological Counseling",
    author="Zhang, Chenhao  and Li, Renhao  and Tan, Minghuan  and Yang, Min  and Zhu, Jingwei  and Yang, Di  and Zhao, Jiahao  and Ye, Guancheng  and Li, Chengming  and Hu, Xiping",
    journal={ACL},
    year={2024}
}