The iMAP pipeline is a two-tier pipeline
- Bioinformatics analysis of 16S rRNA gene reads
- Exploratory visualization.
- Microbiologists
- Ecologist
- Any interested user
The first step is to gather all materials needed for implementing the iMAP pipeline smoothly (Table S1).
Table: Table S1: List of required materials for running iMAP pipeline
Required | Description | Folder | Remarks |
---|---|---|---|
iMAP pipeline | Bundled scripts for comprehensive microbiome analysis | iMAP | Link |
Hardware | Computer with multi-core processor: preferably 64-bit. | ||
Remote Accessory Memory (RAM): 8 GB minimum. | |||
Storage: Tens of gigabytes for small dataset otherwise a few terabytes | |||
Raw data | Demultiplexed reads in FASTQ format with primers and barcodes removed | data/references | |
Sample metadata | A tab-separated file showing sample identifiers, categorical and numeric variables | data/metadata | |
Mapping file | A file that links sample IDs (1st column) to the names of forward (2nd column) and reverse (3rd column) data files | ||
Design files | Files that assign samples to a specific variables or other categories | ||
Software | |||
sekit | For inspecting rawdata format and simple statistics | code | Link |
FASTQc | For creating base call quality score images and statistics | code | Link |
bbmap_bbduk | For trimming poor quality reads | code | Link |
multiqc | For summarizing FASTQc output | Link | |
Mothur | For sequence processing and classifying the sequences and preliminary analysis | code | Link |
Statistical analysis and visualization | |||
R | For statistical analysis and visualization | Link | |
Rstudio | An IDE (integrated development environment) for R | Link | |
iTOL | For display, annotation and management of phylogenetic trees | Link | |
Reference 16S rRNA gene alignments | |||
SILVA (nr) | Reference rRNA alignments | data/references | Link |
Reference 16S rRNA gene classifiers | |||
SILVA(no gap) | Degapped using degap.seqs function in Mothur | data/references | Link |
RDP | Mothur-formatted | data/references | Link |
Greengenes | Mothur-formatted | data/references | Link |
EzBioCloud | Mothur-formatted | data/references | Link |
Custom classifiesr | Any manually built classifiers (Highly recommended). | ||
Table: Table S1: List of required materials for running iMAP pipeline
git clone https://github.com/tmbuza/iMAP.git
cd iMAP
# OR
curl -LOk https://github.com/tmbuza/iMAP/archive/master.zip
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip
cd iMAP
# OR
wget --no-check-certificate https://github.com/tmbuza/iMAP/archive/master.zip
unzip master.zip
mv iMAP-master iMAP
rm -rf master.zip
cd iMAP
- Raw data (demultiplexed compressed FASTQ files).
- Metadata, mothur-formatted mapping files (commonly with extension .design)
- Install required software
- Download reference databases (alignments and classifiers)
# linux
bash ./code/requirements/iMAP_requirements_linux_driver.bash
# mac OS
bash ./code/requirements/iMAP_requirements_mac_driver.bash
# windows
bash ./code/requirements/iMAP_requirements_windows_driver.bash # Incpmplete
bash ./code/requirements/iMAP_checkFiles_driver.bash
open reports/checked_file.txt
# OR
cat reports/checked_file.txt
Middle and right panel
Figure S1: Major folders in the iMAP root directory. Folders and files marked with tick exist. Missing file marked X must be found before proceeding.
This is basically a method where users sequentially run individual or bundle scripts on CLI (Command -Line_Interface) one at a time. We have bundled workflow-specific scripts into a driver to make the analysis easily implemented on CLI by just a single click.
bash ./code/preprocessing/iMAP_preprocessing_driver.bash
bash ./code/summarizeFastQC/iMAP_multiqc_driver.bash
bash ./code/mockcommunity/iMAP_mockcommunity_driver.bash
bash ./code/seqprocessing/iMAP_seqprocessing_driver.bash
bash ./code/seqclassification/iMAP_seqclassification_driver.bash
bash ./code/seqerrorrate/iMAP_seqerrorrate_driver.bash # Optional
bash ./code/otutaxonomy/iMAP_otutaxonomy_driver.bash
bash ./code/annotation/01_processed_seqs.bash
bash ./code/dataanalysis/iMAP_dataanalysis_demo_driver.bash # Optional mothur-based preliminary analysis
The iMAP_driver.bash is the master driver for running all analyses on CLI at once.
bash ./code/linux_iMAP_driver.bash
bash ./code/mac_iMAP_driver.bash
bash ./code/windows_iMAP_driver.bash
# Optionally you can use time tracking driver
bash ./code/linux_time_tracking_driver.bash
bash ./code/mac_time_tracking_driver.bash
bash ./code/windows_time_tracking_driver.bash
Users must create a Portable Batch System (PBS) script that describes cluster resources to be used, parameters for the job and the commands to be executed. The following is a PBS script for running executing iMAP pipeline remotely. Note that you must provide the group allocation name (-A) but this may differ from system to system. Google for help just in case.
Individual driver
#!/bin/bash -f
#PBS iMAPtest
#PBS -A group allocation name
#PBS -l nodes=1:ppn=8
#PBS -l walltime=4000:00:00
#PBS -l pmem=20gb
#PBS -j oe
#PBS -o iMAPtest.log
#PBS -m abe
#PBS -M tmb72@psu.edu
cd $PBS_O_WORKDIR
# Comment unused command(s) as necessary and uncomment the command(s) to be executed
bash ./code/requirements/iMAP_requirements_linux_driver.bash
bash ./code/requirements/iMAP_checkFiles_driver.bash
bash ./code/preprocessing/iMAP_preprocessing_driver.bash
bash ./code/summarizeFastQC/iMAP_multiqc_driver.bash
bash ./code/mockcommunity/iMAP_mockcommunity_driver.bash
bash ./code/seqprocessing/iMAP_seqprocessing_driver.bash
bash ./code/seqclassification/iMAP_seqclassification_driver.bash
bash ./code/seqerrorrate/iMAP_seqerrorrate_driver.bash # Optional
bash ./code/otutaxonomy/iMAP_otutaxonomy_driver.bash
bash ./code/annotation/01_processed_seqs.bash
bash ./code/dataanalysis/iMAP_dataanalysis_demo_driver.bash # Optional mothur-based preliminary analyses
Batch mode
#!/bin/bash -f
#PBS iMAPtest
#PBS -A group allocation name
#PBS -l nodes=1:ppn=8
#PBS -l walltime=4000:00:00
#PBS -l pmem=20gb
#PBS -j oe
#PBS -o iMAPtest.log
#PBS -m abe
#PBS -M tmb72@psu.edu
cd $PBS_O_WORKDIR
bash code/linux_iMAP_driver.bash