NLP Development Banks Collaboration Analyzer 🌏

The project aims to explore the collaborative potential among development banks using Natural Language Processing (NLP) techniques. By analyzing textual data such as project declerations from various development banks, the goal is to uncover insights into potential areas of collaboration and synergy.

Features

Data Pipeline: Data pipeline to fetch and preprocess development cooperation projects from IATI Datastore (IATI)
Similarity Scores: Calculation of text similarities between fetched projects to find similar projects
Extended Similarity Scores: Calculation of a extended similarity score which includes cosine text similarity, CRS3 Codes, CRS5 Codes and SDGs
Application: Visualization of results in an web application through Streamlit (App)

Tech Stack

Models Used

Model Name	Description	Link
all-MiniLM-L6-v2	A small, efficient transformer model for various NLP tasks, used here with Sentence Transformer. Used to create text embeddings to calculate cosine similarity.	MiniLMv2 on Hugging Face
jonas/bert-base-uncased-finetuned-sdg	A classifier model for SDG (Sustainable Development Goals) classification from Hugging Face.	SDG Classifier on Hugging Face

Installation

git clone https://github.com/JanMuehlnikel/NLP-Development-Banks-Collaboration-Analyzer
cd synergy-app
git clone https://huggingface.co/spaces/GIZ/eb-synergy-app
Install NLP-Development-Banks-Collaboration-Analyzer/requirements.txt in virtual enviroment (e.g. conda)

Run Pipeline

Navigate to /config/
Create KEYS.py
Add line IATI_KEY = "{Your_Iati_Datastore_Key}"
Create IATI Datastore API Key and replace it with the placeholder (Create Key - Full Access subscription used)
Navigate to /data/pipeline
Run python pipeline.py
Wait till pipeline finishes
See results in /src/merged_orgas.csv

Calculate Similarities Between All Projects

Navigate to /data/models
Run similarity_minilm.ipynb Notebook
Text based cosine similarity scores stored in /src/similarities.npz
Navigate to /data/models
Run extended_similarities.ipynb Notebook
Extended Similarity Results stored in synergy-app/src/

App

Launch Local (Most likely not possible throgh extremely high RAM usage!)

cd /synergy-app
streamlit run app.py

Visit HuggingFace Space

Through high RAM usage the Streamlit App is hosted in a Hugging Face Space:

https://huggingface.co/spaces/GIZ/eb-synergy-app

Project Structure

├── config/               # configuration files, constants and keys
├── data/                 # pipeline, models and validation
├── src/                  # sources
├── synergy-app/          # Streamlit App to display results (different repo (https://huggingface.co/spaces/GIZ/eb-synergy-app))
├── .gitignore            # files ignored (especially large memmory files)
├── README.md             # project information
└── requirments.txt       # dependecies and libs that need to be installed

JanMuehlnikel/NLP-Development-Banks-Collaboration-Analyzer