FoodMine is a food database project from the Barabasi Lab of the Network Science Institute (Northeastern University). The objective of this project is to mine research litereature to uncover chemical contents in food. The code in this repository was used to generate the food pilots, plots, and other information in FoodMine: Exploring Food Contents in Scientific Literature (Hooton, Menichetti, Barabási). Additionally, we have included the data from out initial experiments.
FoodMine II is an extension of the existing FoodMine infrastructure with better search options and improved paper ranking algorithm.
Notebook to search through the PubMed database and filter out search results.
Notebook to compare clean raw data, compare databases, and vizualize comparisons.
Notebook to retrive molecule smiles, embed compounds, and vizualize embeddings.
Notebook to analyze the citation overlap between CTD and the papers gathered in FoodMine.
Miscellaneous functions and classes to serve as helpers in notebooks.
-
pubmed_util.py Holds functions to interact with PubMed API for the purposes of our research.
-
filter.py Contains class Filter to filter out PubMed search results.
-
collected_data_handling.py Helper functions to clean and aggregate information into FoodMine database.
-
tools/chemidr Suite of tools developed to interact with the PubChem API.
Folder that holds raw data from paper data collection. Note that several files used in our analysis are too large to be uploaded to github
Run the following command after installing Anaconda on your machine
$ conda create -n <environment-name> --file requirements.txt