/exstream-metagenomics-totalrnaseq-ml

Scripts for data-processing and machine learning-based analysis of metagenomics and total RNA-Seq data from the ExStream project

Primary LanguagePythonMIT LicenseMIT

Predicting environmental stressor levels with machine learning: a comparison between amplicon sequencing, metagenomics, and total RNA sequencing based on taxonomically assigned data

This project represents the third chapter of my PhD, in which we compare amplicon sequencing, metagenomics, and total RNA-Seq in their performance to predict aquatic stressor levels. Stressors were predicted using multiple machine learning algorithms and the sequencing data was processed with a variety of data-processing methods. In total, we evaluated 1,536 combinations of taxonomic datasets and data-processing methods. The repo contains scripts to process sequencing data, analyze stressor prediction performance (including the training and testing of machine learning models), and data visualization.

The project is part of the following publication:
Hempel C. A., Buchner D., Mack L., Brasseur M. V., Tulpan D., Leese F., and Steinke D. (2023): Predicting environmental stressor levels with machine learning: a comparison between amplicon sequencing, metagenomics, and total RNA sequencing based on taxonomically assigned data. Frontiers in Microbiology 14:1217750. https://doi.org/10.3389/fmicb.2023.1217750

The entire workflow is shown in the following:

workflow