FoodMine II

FoodMine is a food database project from the Barabasi Lab of the Network Science Institute (Northeastern University). The objective of this project is to mine research litereature to uncover chemical contents in food. The code in this repository was used to generate the food pilots, plots, and other information in FoodMine: Exploring Food Contents in Scientific Literature (Hooton, Menichetti, Barabási). Additionally, we have included the data from out initial experiments.

FoodMine II is an extension of the existing FoodMine infrastructure with better search options and improved paper ranking algorithm.

Files

Paper_Screening.ipynb

Notebook to search through the PubMed database and filter out search results.

Data_Statistics.ipynb

Notebook to compare clean raw data, compare databases, and vizualize comparisons.

Molecule_Embedding.ipynb

Notebook to retrive molecule smiles, embed compounds, and vizualize embeddings.

Paper_Citations.ipynb

Notebook to analyze the citation overlap between CTD and the papers gathered in FoodMine.

src

Miscellaneous functions and classes to serve as helpers in notebooks.

pubmed_util.py Holds functions to interact with PubMed API for the purposes of our research.
filter.py Contains class Filter to filter out PubMed search results.
collected_data_handling.py Helper functions to clean and aggregate information into FoodMine database.
tools/chemidr Suite of tools developed to interact with the PubChem API.

data

Folder that holds raw data from paper data collection. Note that several files used in our analysis are too large to be uploaded to github

Create the environment to run Foodmine II

Run the following command after installing Anaconda on your machine

$ conda create -n <environment-name> --file requirements.txt

ChatterjeeAyan/FoodMineII