Exploring Bible datasets, mainly from Kaggle
This repository contains a simple search and display fo the Kaggle Bible Corpus.
My intention with this experiment is to study ways of exploring a text dataset.
-
Clone the repo
git clone git@github.com:leomrocha/bible-explore.git
-
Install the dependencies
pip3 install -r requirements.txt
- Download the Kaggle Bible Corpus Dataset
From here
And select the language you want (this demo is built with the english one but it can be changed)
- Encode the dataset and compute similarities
Even if this description is not complete, there is a notebook that allows to encode and explore everything in the notebooks/bible-explore-one.ipynb
directory
You should have 3 python pickled files as output in a db
directory:
db/bible-db.pkl
db/bible-embeddings.pkl
db/graph-db.pkl
- launch the development server
uvicorn src.server:app --reload
- Develop And you can create a Pull Request if you make something :)