Repository for the article:
Props, R., Monsieurs, P., Vandamme, P., Leys, N., Denef V.J. and Boon, N., Gene Expansion and Positive Selection as Bacterial Adaptations to Oligotrophic Conditions. mSphere 4:e00011-19. https://doi.org/10.1128/mSphereDirect.00011-19.
Sickle
was used for removing erroneous and low-quality reads from the raw data. Scythe
was used for removing adapter sequences.
idba_ud -kmin 41 -kmax 101 -s 10 -i dt_int.fasta
Reads were mapped to the co-assembly using bwa-mem
on default settings. samtools
, bedtools2
and anvi'o
were used to format/profile the mapping files for anvi'o
(convert to bam
file + sort).
Initial binning was performed on the contig splits generated by anvi'o
by means of a supervised and manual binning strategy in Vizbin on default settings. The bins were inspected and further manually refined in anvi'o
. The contig database with the associated bins are available here.
Completeness and redundancy estimates of all MAGs (and also all reference Ramlibacter genomes) were estimated through the marker gene databases in anvi'o
as well as the lineage-specific marker sets in CheckM
.
Relative abundances were calculated by mapping all reads to the co-assembly using bwa-mem
on default settings and applying the following normalization to bin size (no %GC correction was applied to the data):
EMIRGE
was run on the merged QC'd fastq files with insert size 500 and st.dev 500 in order to achieve maximum mapping of reads. Futher, EMIRGE
was run both with the full NR database clustered at 97% using UCLUST
and with the small manually curated freshwater database (FWDB) available from here. In addition emirge was run separately with -j
on 0.97 and 1.0 allowing both 97% consensus sequences and unique sequences to be reconstructed. From all these runs a merged fasta file was constructed. Finally, all the reconstructed sequences were clustered at 97% identity using UCLUST
. EMIRGE
reconstructed sequences with a normalized prior abundance of less than 5% were removed, and sequences were ordered from high to low abundance before clustering because UCLUST
is dependent on the order of the sequences (see here). Classification of the sequences was performed using the TaxAss
pipeline as described here. Full-length sequences were classified using the FWDB database if their top matches to the FWDB were higher than 95%.
Emirge reconstructed sequences
NR_1.0
15|HQ222271.1.1558_Prior=0.791329_Length=1536_NormPrior=0.784664 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
16|EF516083.1.1452_Prior=0.020773_Length=1440_NormPrior=0.021971 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
0|FR853751.1.1492_Prior=0.187898_Length=1480_NormPrior=0.193365 Bacteria(100);Bacteroidetes(100);Sphingobacteriia(100);Sphingobacteriales(100);Chitinophagaceae(100);Sediminibacterium(100);
NR_0.97
15|HQ222271.1.1558_Prior=0.812102_Length=1536_NormPrior=0.806369 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
0|FR853751.1.1492_Prior=0.187898_Length=1480_NormPrior=0.193631 Bacteria(100);Bacteroidetes(100);Sphingobacteriia(100);Sphingobacteriales(100);Chitinophagaceae(100);Sediminibacterium(100);
FWDB_0.97
5_Bctrm474_Prior=1.000000_Length=1481_NormPrior=1.000000 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Ramlibacter(91);
FWDB_1.0
5|Bctrm474_Prior=0.097627_Length=1481_NormPrior=0.097627 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
22|LimCurv2_Prior=0.015439_Length=1481_NormPrior=0.015439 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
18|LimSpe14_Prior=0.886934_Length=1481_NormPrior=0.886934 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Comamonadaceae_unclassified(100);
Clustering of merged sequence file
usearch -cluster_fast merged.emirged.cons.fasta -id 0.97 -centroids merged.emirged.cons.0.97.fasta -uc merged.emirged.cons.0.97.clusters.uc
Final dereplicated consensus EMIRGE sequences
5_Bctrm474_Prior=1.000000_Length=1481_NormPrior=1.000000 Bacteria(100);Proteobacteria(100);Betaproteobacteria(100);Burkholderiales(100);Comamonadaceae(100);Ramlibacter(91);
0|FR853751.1.1492_Prior=0.187898_Length=1480_NormPrior=0.193365 Bacteria(100);Bacteroidetes(100);Sphingobacteriia(100);Sphingobacteriales(100);Chitinophagaceae(100);Sediminibacterium(100);