Geo Stories

This software, accepts sentences in russian and generates an image which is geo-loacalised for indian audience. The framework developed has four components :

Text Translation: Used for transalting sentences from russian to english. (The software uses: OPUS mt ru-en)
Text Mining: Identifying words which needs to be localised . (The software uses NERs, Parsing and WordNet)
Word Embeddings: Used for aligning the words according to a geography by performing analogy. (The software uses Word2Vec/GloVe/BERT)
Latent Diffusion Model: Used for generating an image. (The software uses fine-tuned stable diffusion model)

Below is the framework of this software:

Installation

Clone repo and install requirements.txt in a Python>=3.8.0:

git clone https://github.com/achyutk/geo-aligned-2.0.git #clone
cd geo-aligned-2.0.git
pip install -r requirements.txt #install
python -m spacy download en_core_web_sm #Execute this for a specific spacy library
pip install -U git+https://github.com/openai/CLIP.git # Install this for evaluation

Model Weights

Download the following model weight from the hyperlinks provided and paste it in the corresponding folder.

Download Word2Vec Embeddings and place it into /word2vec/model folder.
Download the Stable DIffusion (achyut_sd) and place it into /diffusion_model folder

Datasets

The following datasets are used in this project:

India Corpus : Used for training Word2Vec model
Wikipedia Corpus : Used for training another Word2Vec model
English Book Dataset : Used for evaluating the framework
Russian Book Dataset : Used for evaluating the framework

Scripts

main.ipynb

This file executes the framework. Make necessary changes for the framework in the model_download and utils files and run this jupyter notebook.

Fill the text in the "sentence" variable for which the image is to be generated and run the remaining commands.

VOILA!!! The aligned image is generated

Examples

Below is an example of results for different combinations of components for the fllowing example:

input in russian: "Шерлок ест арбуз в Лондоне. Он одет в темную шляпу, накидку, четкую рубашку с высоким воротником, хорошо подогнанные брюки, начищенные туфли и жилет. Погода на улице дождливая"

input in english: "Sherlock eating watermelon in London. He is wearing a dark hat, cape, a crisp shirt with a high collar, well-fitted trousers, polished shoe and a waistcoat. Weather outside is rainy"

achyutk/geo-aligned-2.0