Towards the biogeography of prokaryotic genes
This is the Supplemental Software package to the manuscript "Towards the biogeography of prokaryotic genes" by Coelho et al. (forthcoming).
The purpose of this repository is to archive the code that generated both the resource and the analyses in the manuscript.
The Global Microbial Gene Catalogue is available at https://gmgc.embl.de.
Dependencies
The initial processing of the metagenomes is performed with NGLess, using MEGAHIT. ORF calling was performed using MetaGeneMark.
The catalog building was performed with a mixture of custom Haskell/C++ code and mmseqs2.
The subsquent analyses were performed in Python, using NumPy, Pandas, and Jug.
License: MIT
Data Availability
Sequence data: The full raw data (metagenomes) is available from ENA (see Supplemental Table 1 in the manuscript for a comprehensive list of all accession numbers.
Gene catalog & annotations: The gene catalog and its annotations is available at https://gmgc.embl.de.
Preprocessed data: For convenience, preprocessed derived data is also
available under the preprocessed/
directory. These were computed with the
code