/MINE-2.0-Paper

All code used to generate results in MINE 2.0 paper

Primary LanguageJupyter Notebook

MINE 2.0: enhanced biochemical coverage for peak identification in untargeted metabolomics

MINEs (Metabolic In silico Network Expansions) are databases of predicted biological compounds. These compounds are predicted by applying biochemical reaction rules to known metabolites. The set of rules used to build MINE 2.0 is much more comprehensive than that used for MINE 1.0 (see the published ruleset here).

This repo contains all code used to generate results reported in the MINE 2.0 paper. If you use this code or MINE 2.0 itself, please cite the MINE 2.0 paper. If you don't have access to Oxford Bioinformatics, e-mail me at jonstrutz11@gmail.com, and I will be happy to send you a copy!

Finally, the MINE 2.0 website can be accessed here.

Notebooks

All notebooks located in ./02 Analyzed data/Notebooks_Final.

Table 1

  • No notebooks. All calculations for Table 1 were made by directly querying the MINE MongoDB.

Table 2

  • Mass_Bank_Coverage_KEGG.ipynb
  • Mass_Bank_Coverage_KEGG1.0.ipynb
  • Mass_Bank_Coverage_KEGG2.0.ipynb
  • Mass_Bank_Coverage_PubChem.ipynb
  • PubChem_Unique_Formulas.ipynb

Table 3

  • Sauer_Coverage_KEGG.ipynb
  • Sauer_Coverage_KEGG1.0.ipynb
  • Sauer_Coverage_KEGG2.0.ipynb
  • Sauer_Coverage_EcoCyc1.0.ipynb
  • Sauer_Coverage_EcoCyc2.0.ipynb
  • Sauer_Coverage_PubChem.ipynb

Other Notebooks

  • KEGG_Dataset_SMILES_to_Mass.ipynb (calculate mass for all KEGG compounds)
  • MassBank_MS2_Search_KEGG2.0 (for MS2-based search results in SI)

Final Notes

Note that to run most of the notebooks, I used a direct connection to our MongoDB database (rather than going through the MINE API) with my credentials stored in a credentials.txt file. To replicate this work, you will need to use the MINE API instead. You will also need to install minedatabase.

Also note that some large files are not present in this repo (see .gitignore) but are available upon request.