/plant-chemical-space

Source code and data for "Exploring the known chemical space of the plant kingdom: Insights into taxonomic patterns, knowledge gaps, and bioactive regions"

Primary LanguageJupyter NotebookGNU General Public License v2.0GPL-2.0

Exploring the known chemical space of the plant kingdom: Insights into taxonomic patterns, knowledge gaps, and bioactive regions

This repository contains code and data described in detail in our paper, "Exploring the known chemical space of the plant kingdom: Insights into taxonomic patterns, knowledge gaps, and bioactive regions" (Domingo-Fernández et al., 2023).

Table of Contents

Citation

If you have found our manuscript useful in your work, please consider citing:

Domingo-Fernandez, D.†, Gadiya, Y.†, Mubeen, S., Healey, D., Norman, B., Colluru, V. (2023). Exploring the known chemical space of the plant kingdom: Insights into taxonomic patterns, knowledge gaps, and bioactive regions. Journal of Cheminformatics. 10.1186/s13321-023-00778-w

Reproducibility

Set up the environment (python)

Install requirements

python -m venv .venv && source ./.venv/bin/activate
pip install -r requirements.txt

Rerun the notebooks

Run the notebooks located in the notebooks corresponding to each analysis. The prefix of the notebooks indicates the order in which is run, which also corresponds to the Results sections of the manuscript. For detailed information about each notebook, see the README inside the notebooks directory.

Set up the environment (R)

To re-create the circular plot with heat-map, make sure to have R v4.2.2 navigate to notebooks/taxonomic_tree_viz and run the R scripts. Please install the libraries listed at the top of the script using the command install.packages("package_name")

Data

The manuscript is based on publictly available data from the following resources:

Datasets are publically available and can be directly downloaded from DOI

Furthermore, the directory data contains all the figures of the manuscript (generated by the notebooks) as well as tge raw and intermediary files (also generated by the notebooks).

References

  1. Rutz, A., Sorokina, M., Galgonek, J., Mietchen, D., Willighagen, E., Gaudry, A., ... & Allard, P. M. (2022). The LOTUS initiative for open knowledge management in natural products research. Elife, 11, e70780.
  2. Sorokina, M., Merseburger, P., Rajan, K., Yirik, M. A., & Steinbeck, C. (2021). COCONUT online: collection of open natural products database. Journal of Cheminformatics, 13(1), 1-13.