Based on Pandas, Gensim with an Hebrew twist, originally developed for OpenBudget needs for quick entity comparison.
Notice: Currently, comparison will consider ONLY textual properties. Roadmap includes a planned step to consider quantitative variables as well.
pip install -r requirements.txt- verify correct settings in the sources.json file (you might prefer to create a new sources.json file, make sure to point it from settings.py)
- define on settings.py what kind of entity you would like to explore. Notice it must be one of the entities defined in the sources.json file.
- `python3 run.py <entity_id>``
- settings.py: to define sources file path, define which entity you would like to compare, how many results (similar entities) would you like to get.
- data_loader: load data packages and transform into pandas Dataframe
- tokenization: narrow and unify textual elements
- tfidf: similarity calculation based on term frequencies and cosinus similarity
- Upcoming steps are listed in the roadmap.md