Wang and Li et al., PhyloFunc: Phylogeny-informed Functional Distance as a New Ecological Metric for Metaproteomic Data Analysis
- Generate PhyloFunc to incorporate microbiome phylogeny to inform on metaproteomic functional distance.
- preprocess data,compute PhyloFunc ditance matrix, evaluate the performance by machine learning-based classification algorithms.
- The original dataset is given in "./1_datasets_search_results/". Decompress any .zip files.
- Example outputs are given in "./2_data_processing/".
- Python (version >= 3.11.0) and packages pandas (version >= 2.0.3) and numpy(version >= 1.24.3) were required. R (version >= 4.2.0) and Rstudio were required.
- The codes have been tested on Python (version = 3.11.5) with packages pandas (version = 2.0.3) and numpy(version = 1.24.3) ran by Spyder 5.4.3 on a Windows computer.
- The time of computing PhyloFunc distance for the human gut microbiome dataset is the most time-consuming, approximately 20-30 min on a Windows System (version = 11) for 48 sample pairs. The calculation of PhyloFunc for the toy dataset and mouse gut microbiome dataset only takes a few minutes because the sample size is relatively small.
- Download data, example outputs and codes in this repository.
- Open each .py code file: PhyloFunc_toy_dataset.py, PhyloFunc_mouse_gut_dataset.py, or PhyloFunc_human_gut_dataset.py using Spyder IDE.
- Follow the steps to run codes through preprocessing raw data, generating branches of tree, creating taxon-function table, and calculating the PhyloFunc distance. The output files will be created automatically during running the code to verify the calculation process of the program.
- Besides, R codes for computing other three distances (./1_PhyloFunc calculation/3_human gut microbiome and 1_PhyloFunc calculation/2_mouse gut microbiome/four cases/original dataset) and Python codes for evaluating the performance of distances (./2_data_processing/2_Evaluation/) are provided.