Towards AI-Driven Healthcare: Systematic Optimization, Linguistic Analysis, and Clinicians’ Evaluation of Large Language Models for Smoking Cessation Interventions

Scripts for the accepted paper in CHI 2024 conference.

Dataset

A dataset of smoking cessation intervention messages developed by TSET Health Promotion Research Center was used. The dataset is private.

Stages of processing:

Prompt and Decoding Selection

prepare_dataset_to_prompts/prepare_dataset_to_prompts.py: This script takes as input the csv file and split it in two or three parts. Then, it stores them as csv. Additionally, it stores them with prompt format.
generate_input_prompts/generate_input_prompts.py: This script takes as input the train or validation messages and generates the different kind of prompts used in the paper.
generate_lm_output/generate_lm_output.py: It loads the model and gets generated sentences for different prompt version
generate_lm_output/generate_lm_output_decoding.py: It loads the model and gets generated sentences for different decoding versions.
get_sentences_from_output/get_sentences_from_output_vB_v2.py: It splits the string obtained in order to obtain the generated messages for prompt selection.
get_sentences_from_output/get_sentences_from_output_vC_v2.py: It splits the string obtained in order to obtained the generated messages for decoding selection.
'get_statistics_messages_generated/get_statistics_messages_generated.py: it produces two files: * statistics_summary.csv: average number of messages per model/version * statistics_index_prompt_level.csv: number of messages per prompt
postprocess_on_selfBLEU/postprocess_on_selfBLEU.py: it obtains the BLEU-4 value
discard_on_selfBLEU/discard_on_selfBLEU.py: it discards messages based on threshold for selfBLEU.
discard_on_criteria/discard_on_criteria.py: it discards messages based on certain criteria. For our case: presence of app or similar word, underscore(_) and message length less than 6.
calculate_perplexity/calculate_perplexity.py: it allows to calculate perplexity according to a specific language model
join_results_perplexity/join_results_perplexity.py: it joins the perplexities values from different models.
join_all_sentences/join_all_sentences_v2.py: it joins all final sentences through different combinations.
join_all_sentences/join_all_sentences_v2_vC.py: it joins all final sentences through different combinations and the original messages. The difference with the previous function is the number of outputs. It outputs 2 csv files: one with type original and the other with type: train or validation according to the belonging of the original message.
process_LIWC_results/process_LIWC_results.py: it calculates the mean, standard deviation and standardd error for selected LIWC metrics.
summary tables repetition and criteria.ipynb: it calculates mean, standard deviation, and standard error for number of message at the interaction level.
Process statistics discard and BLEU.ipynb: it calculates mean, standard deviation, and standard error for the reduction after applying the filtering.

Automatic survey generation for Qualtrics

Prepare sentences for surveys.ipynb: From the joined messages selects randomly 100 for each model.
Prepare format txt for Qualtrics.ipynb: It creates Advanced Format TXt to be import in Qualtrics.

Processing of Survey results

reshape_qualtrics_results.ipynb: It reshapes the surveys results from horizontal to vertical.
prepare_qualtrics_for_analysis.ipynb: It prepares the surveys results to be analyzed for GLMM and LIWC of the generated messages
prepare_qualtrics_for_analysis_modified.ipynb: It analyzed the revision from the surveys results for statistics and BLEU-4

Commands used:

python prepare_dataset_to_prompts -j prepare_dataset_to_prompts_v1.json
python generate_input_prompts.py -j generate_input_prompts_vB/generate_input_prompts_v1.json
python generate_input_prompts.py -j generate_input_prompts_vB/generate_input_prompts_v2.json
python generate_input_prompts.py -j generate_input_prompts_vB/generate_input_prompts_v3.json
python generate_input_prompts.py -j generate_input_prompts_vB/generate_input_prompts_v4.json
python generate_input_prompts.py -j generate_input_prompts_vB/generate_input_prompts_v5.json

vB: Prompt selection

# generation
python generate_lm_output.py -j ./generate_lm_output_vB/generate_lm_output_gpt-j-6b.json
python generate_lm_output.py -j ./generate_lm_output_vB/generate_lm_output_bloom-7b1.json
python generate_lm_output.py -j ./generate_lm_output_vB/generate_lm_output_opt-6.7b.json
python generate_lm_output.py -j ./generate_lm_output_vB/generate_lm_output_opt-13b.json
python generate_lm_output.py -j ./generate_lm_output_vB/generate_lm_output_opt-30b.json

# filter
python discard_on_selfBLEU.py -j ./discard_on_selfBLEU_vB_v2/discard_on_selfBLEU4.json
python discard_on_criteria.py -j ./discard_on_criteria/discard_on_criteria_vB.json

# perplexity calculation
python calculate_perplexity.py -j ./calculate_perplexity_vB_v2/calculate_perplexity_gptj6b.json
python calculate_perplexity.py -j ./calculate_perplexity_vB_v2/calculate_perplexity_bloom-7b1.json
python calculate_perplexity.py -j ./calculate_perplexity_vB_v2/calculate_perplexity_opt-6.7b.json
python calculate_perplexity.py -j ./calculate_perplexity_vB_v2/calculate_perplexity_opt-13b.json
python calculate_perplexity.py -j ./calculate_perplexity_vB_v2/calculate_perplexity_opt-30b.json

# join perpleixty results
python join_results_perplexity.py -j ./join_results_perplexity_vB/join_results_perplexity.json

# join sentences to be analyzed with LIWC
python join_all_sentences_v2.py -j ./join_all_sentences_vB_v2/join_all_sentences_v2.json

# Process LIWC results 
python process_LIWC_results.py -j ./process_LIWC_results_vB/process_LIWC_results_v1_vB.json
python process_LIWC_results.py -j ./process_LIWC_results_vB/process_LIWC_results_v2_vB.json

vC: Decoding strategy selection

# generation
python get_sentences_from_output_vC.py ./get_sentences_from_output_lm_vC/get_sentences_from_output_lm_gpt-j-6b.json
python get_sentences_from_output_vC.py ./get_sentences_from_output_lm_vC/get_sentences_from_output_lm_bloom-7b1.json
python get_sentences_from_output_vC.py ./get_sentences_from_output_lm_vC/get_sentences_from_output_lm_opt-6.7b.json
python get_sentences_from_output_vC.py ./get_sentences_from_output_lm_vC/get_sentences_from_output_lm_opt-13b.json
python get_sentences_from_output_vC.py ./get_sentences_from_output_lm_vC/get_sentences_from_output_lm_opt-30b.json

python get_statistics_messages_generated.py -j ./get_statistics_messages_generated_vC/get_statistics_messages_generated.json

# BLEU-4
python postprocess_on_selfBLEU.py -j ./postprocess_on_selfBLEU4_vC/postprocess_on_selfBLEU4.json

# filter
python discard_on_selfBLEU.py -j ./discard_on_selfBLEU_vC/discard_on_selfBLEU4.json
python discard_on_criteria.py -j ./discard_on_criteria/discard_on_criteria_vC.json

# perplexity
python calculate_perplexity.py -j ./calculate_perplexity_vC/calculate_perplexity_gptj6b.json
python calculate_perplexity.py -j ./calculate_perplexity_vC/calculate_perplexity_bloom-7b1.json
python calculate_perplexity.py -j ./calculate_perplexity_vC/calculate_perplexity_opt-6.7b.json
python calculate_perplexity.py -j ./calculate_perplexity_vC/calculate_perplexity_opt-13b.json
python calculate_perplexity.py -j ./calculate_perplexity_vC/calculate_perplexity_opt-30b.json

# join perpleixty results
python join_results_perplexity.py -j ./join_results_perplexity_vC/join_results_perplexity_vC.json

# join sentences to be analyzed with LIWC
python join_all_sentences_v2_vC.py -j ./join_all_sentences_vC/join_all_sentences_vC.json

# analysis of LIWC results
python process_LIWC_results.py -j ./process_LIWC_results_vC/process_LIWC_results_v1.json
python process_LIWC_results.py -j ./process_LIWC_results_vC/process_LIWC_results_v2.json

Process ChatGPT messages:

python get_sentences_from_output_ChatGPT.py -j get_sentences_from_output_ChatGPT/get_sentences_from_output_prompt_v4_ChatGPT.json
python postprocess_on_selfBLEU.py -j ./postprocess_on_selfBLEU_ChatGPT/postprocess_on_selfBLEU4.json
python discard_on_selfBLEU.py -j ./discard_on_selfBLEU_ChatGPT/discard_on_selfBLEU4.json
python discard_on_criteria.py -j discard_on_criteria_ChatGPT.json

# join sentences to be analyzed with LIWC
python join_all_sentences_v2_vC.py -j ./join_all_sentences/join_all_sentences_ChatGPT/join_all_sentences_ChatGPT.json

Citation

Cite like this:

Calle, P., Shao, R., Liu, Y., Hebert, E., Kendzor, D., Neil, J., Businelle, M., & Pan, C (2024, April). Towards AI-driven healthcare: Systematic optimization, linguistic analysis, and clinicians’ evaluation of large language models for smoking cessation interventions. In Proceedings of the 2024 CHI conference on human factors in computing systems.

thepanlab/smoking_cessation_intervention_messages