Disaster Preparedness Determinants Clustering

This repository contains a Python script designed to analyze disaster preparedness determinants. The script uses Ollama's Llama:3.1 model to generate concise themes for textual data and clusters these themes using machine learning techniques. The final results are saved in an Excel file for further analysis.

Features

  1. Title Generation Using Llama:3.1

    • The script generates a concise, meaningful title for each determinant using the Ollama Llama:3.1 model.
  2. Embeddings and Clustering

    • The generated titles are transformed into numerical vectors using SentenceTransformer.
    • Themes are clustered using hierarchical clustering to group related determinants together.
  3. Output

    • Saves the processed data, including generated titles and clusters, into an Excel file for convenient access and analysis.

Requirements

Python Packages

Install the required Python packages using pip:

pip install pandas sentence-transformers scikit-learn

Ollama

Ensure the Ollama is installed and the llama3.1:latest model is available.

  • Install Ollama
  • Verify that ollama is available in your command-line environment:
ollama --version
  • Add the llama3.1:latest model:
ollama pull llama3.1:latest

Usage

  1. Clone this repository:
git clone https://github.com/cpscesar/intra_inter.git
  1. Run the script:
python title_generation.py
  1. The script will:
    • Process the determinants in the input dataset.
    • Generate titles using Llama:3.1.
    • Cluster the themes based on semantic similarity.
    • Save the results to an Excel file named disaster_preparedness_clusters.xlsx.

Script Overview

Input Data

The input data is a DataFrame containing determinants of disaster preparedness in the column Intra/Interpersonal. Example:

Intra/Interpersonal
Risk Perception
Trust in neighbors to provide help during disasters.
Confidence in own ability to create an emergency kit.
Willingness to evacuate when instructed.
Participation in community drills for disaster preparedness.

Generated Titles

The script uses Ollama Llama:3.1 to generate a concise title for each determinant. Titles are added to a new column titles1 in the DataFrame.

Clustering

  • Titles are embedded into vectors using SentenceTransformer.
  • Agglomerative clustering is applied with a distance_threshold of 15.
  • Clustering results are stored in a new column Cluster_titles1.

Output

The final DataFrame is saved to an Excel file:

Intra/Interpersonal titles1 Cluster_titles1
Risk Perception Risk Awareness 0
Trust in neighbors to provide help during disasters. Community Support Trust 1
Confidence in own ability to create an emergency kit. Self-Efficacy in Preparedness 2
Willingness to evacuate when instructed. Evacuation Readiness 2
Participation in community drills for disaster preparedness. Community Drill Engagement 1

Customization

Adjusting the Clustering

You can modify the clustering behavior by adjusting the distance_threshold or using a fixed number of clusters:

clustering = AgglomerativeClustering(n_clusters=5, linkage='average')

Adding More Determinants

Extend the input data in the data dictionary or load your own dataset via CSV/Excel.

Error Handling

If the ollama fails to generate a title, the script will log the error and use "Error generating title" as a placeholder.

Contributions

Feel free to submit issues or pull requests to improve this project. Contributions are welcome!

Acknowledgments

This script and repository were utilized in the following work:

Soares, C. P., Kolen, K. A., Spiteri, R., Aghaie, V.; Dhawale, R. & Schuster-Wallace, C. (2024, March 15). Intrapersonal and interpersonal determinants of household and community disaster mitigation behaviour: A scoping review protocol. https://doi.org/10.17605/OSF.IO/ZW7UA