/asap2

Automatically analyze prokaryotic 16S rRNA gene amplicon sequencing data using QIIME 2 and other tools.

Primary LanguagePythonMIT LicenseMIT

Update: We have developed a Galaxy web server for users to analyze their data. Please visit http://hts.iit.edu/galaxy and find HTS Initiative in the tools panel, or just visit http://hts.iit.edu/galaxy/?tool_id=asap2&version=latest

We developed Amplicon Sequence Analysis Pipeline 2 (ASAP 2) to analyze marker gene amplicon sequence data automatically and consistently. It was designed to be user-friendly and time-saving. Users just need to organize their fastq (demultiplexed or multiplexed, barcode inside or outside, single-end or pair-end) or and metadata properly. Multiple projects can be put together and the pipeline will automatically merge the data. People can even put their own data and data from collaborators or NCBI together to compare. Also, if you have intermediate files (feature table and representative sequence) from collaborators and you don't want to repeat the previous steps, just start from them.

######### Dependency #########
Please install the following packages.

Python>=3.6
QIIME2>=2020.6 
pandas>=0.25.3
biopython>=1.77
vegan>=2.5-6 # R package

If you install QIIME 2 in a virtual environment, remember to activate it before running ASAP 2.

######### Download classifier model #########
Please download classifier models and put them in asap2/lib/classifier/ (create the folder first). Please download the classifiers rather than the sequence data.
https://docs.qiime2.org/xxxx.xx/data-resources/
Replace xxxx.xx with your QIIME version. For example, 2021.4
If you download classifier from other place (e.g. QIIME forum), make sure the version of scikit-learn is the same as what you have installed for QIIME 2.

######### Download example data  #########
Please download the example data (https://zenodo.org/records/10729736/files/input.zip) and unzip it in asap2/example/ (create the folder first) and run

cd asap2/example/
python -u ../bin/asap2.py -i input -c xxx

to test the pipeline and the dependencies.

######### Prepare data files and run #########
Please prepare your sequence file and metadata according to the manual. 
Important: It is efficient to download a test data as a template for your data preparation. You can download our test data set (https://zenodo.org/records/10729736/files/input.zip), including metadata files,sequence files and file organization of all data formats, for reference. 
Check once more the folder structure and naming according to the example data. Then put all the project data in one folder (e.g. input/) and run the pipeline outside the data folder.

IMPORTANT: note that only projects using the same region (the same primer set, or regions with >80% overlap if you don't mind the bias resulted from primer amplification) can be put together and merged, or else, the phylogenetic tree will tell a quite different story.

######## View the result files #########
The results include these folders:

demultiplexed
demuxSumViz
diversity
feature
featureMerged
imported
ordination
phylogeny
readSummary
taxonomy

TXT files can be opened by notepad editor. TSV files can be opened by Excel. 
QZA and QZV can be viewed using QIIME 2 View: https://view.qiime2.org/
If you want to use the files inside QZA and QZV, simple run 'unzip xx.qza/v'.

Please read the manual for the result explanation.

If you find our pipeline helpful, please cite our paper:
Tian, R. Imanian, B. ASAP 2: A Pipeline and Web Server to Analyze Marker Gene Amplicon Sequencing Data Automatically and Consistently. BMC Bioinformatics 23, 27 (2022). https://doi.org/10.1186/s12859-021-04555-0