This repository contains a Python script designed to analyze disaster preparedness determinants. The script uses Ollama's Llama:3.1 model to generate concise themes for textual data and clusters these themes using machine learning techniques. The final results are saved in an Excel file for further analysis.
-
Title Generation Using Llama:3.1
- The script generates a concise, meaningful title for each determinant using the Ollama Llama:3.1 model.
-
Embeddings and Clustering
- The generated titles are transformed into numerical vectors using
SentenceTransformer
. - Themes are clustered using hierarchical clustering to group related determinants together.
- The generated titles are transformed into numerical vectors using
-
Output
- Saves the processed data, including generated titles and clusters, into an Excel file for convenient access and analysis.
Install the required Python packages using pip:
pip install pandas sentence-transformers scikit-learn
Ensure the Ollama is installed and the llama3.1:latest
model is available.
- Install Ollama
- Verify that
ollama
is available in your command-line environment:
ollama --version
- Add the
llama3.1:latest
model:
ollama pull llama3.1:latest
- Clone this repository:
git clone https://github.com/cpscesar/intra_inter.git
- Run the script:
python title_generation.py
- The script will:
- Process the determinants in the input dataset.
- Generate titles using Llama:3.1.
- Cluster the themes based on semantic similarity.
- Save the results to an Excel file named
disaster_preparedness_clusters.xlsx
.
The input data is a DataFrame containing determinants of disaster preparedness in the column Intra/Interpersonal
. Example:
Intra/Interpersonal |
---|
Risk Perception |
Trust in neighbors to provide help during disasters. |
Confidence in own ability to create an emergency kit. |
Willingness to evacuate when instructed. |
Participation in community drills for disaster preparedness. |
The script uses Ollama Llama:3.1 to generate a concise title for each determinant. Titles are added to a new column titles1
in the DataFrame.
- Titles are embedded into vectors using
SentenceTransformer
. - Agglomerative clustering is applied with a
distance_threshold
of 15. - Clustering results are stored in a new column
Cluster_titles1
.
The final DataFrame is saved to an Excel file:
Intra/Interpersonal | titles1 | Cluster_titles1 |
---|---|---|
Risk Perception | Risk Awareness | 0 |
Trust in neighbors to provide help during disasters. | Community Support Trust | 1 |
Confidence in own ability to create an emergency kit. | Self-Efficacy in Preparedness | 2 |
Willingness to evacuate when instructed. | Evacuation Readiness | 2 |
Participation in community drills for disaster preparedness. | Community Drill Engagement | 1 |
You can modify the clustering behavior by adjusting the distance_threshold
or using a fixed number of clusters:
clustering = AgglomerativeClustering(n_clusters=5, linkage='average')
Extend the input data in the data
dictionary or load your own dataset via CSV/Excel.
If the ollama
fails to generate a title, the script will log the error and use "Error generating title"
as a placeholder.
Feel free to submit issues or pull requests to improve this project. Contributions are welcome!
This script and repository were utilized in the following work:
Soares, C. P., Kolen, K. A., Spiteri, R., Aghaie, V.; Dhawale, R. & Schuster-Wallace, C. (2024, March 15). Intrapersonal and interpersonal determinants of household and community disaster mitigation behaviour: A scoping review protocol. https://doi.org/10.17605/OSF.IO/ZW7UA