- UFRC-YAMP is an extension of YAMP for UFRC users.
- YAMP is a tool for sequence data processing. For basic usage of YAMP, please check its wiki
- Please cite YAMP when you use it for publication purposes:
Visconti A,. Martin T.C., and Falchi M., "YAMP: a containerised workflow enabling reproducibility in metagenomics research", GigaScience (2018), https://doi.org/10.1093/gigascience/giy072
- Make necessary folders:
mkdir -p data
mkdir -p results
mkdir -p logs
- Prepare resources before running YAMP:
sbatch run-01_getResources.sh
- Run demo with example data:
bash run-02_getDemoData.sh && sbatch run-03_runYAMPdemo.sh
- Run your data in parallel:
bash run.sh
- Please modify slurm configuration in
hpc_submit.sh
andrun-03_runYAMPdemo.sh
before running.- Data should be stored in
data
folder. Data format can be.tar.gz
,.tar.bz2
(The code will decompress them automatically), or paired of.fastq*
files that are under the same folder. Results will be stored inresult
folder. Now the code can only support to run for paired files. Please name all your paired files in the form ofA-R1-B
andA-R2-B
, in whichA
andB
stand for two strings in the file names. The output directory will be named asA-
.- Processing each pair of files need 4 CPUs and 40 GB Memory (Please modify
nextflow.config
file if you want to use a different one). If you have N CPUs and M Giga Bytes Memory size, you will be able to run min(N/4, M/40) in parallel.
- Get statistics/completeness from results after getting all needed results:
ml python3 && python3 get_stats.py
- Get MultiQC report:
ml gcc/5.2.0 && ml multiqc/1.5 && multiqc results/
Enhancements:
- Change file locations in
run-01_getResources.sh
from absolute to relative, which is more flexible (i.e. if you copy UFRC-YAMP fromA/UFRC-YAMP
toB/UFRC-YAMP
, relagive location will not generate error). - Add
hpc_submit.sh
,parallel.py
andrun.sh
to enable UCRC-YAMP to process multiple pairs of files in parallel.
Enhancements:
- Add
get_stats.py
to get statistics from results, adjustedrun-03_runYAMPdemo.sh
andrun.sh
. A sample stats file is underresults
folder. - Update README for UFRC-YAMP.
Enhancements:
- Add code to
get_stats.py
to get completeness of each data (i.e. the presence of "STEP 3 (Community Characterisation) terminated" in the log file under a certain folder indicates the completeness of the processing) - Enable multiQC to report the complete status of the datasets.
- Update README for UFRC-YAMP.
- In
run.sh
,ml python3
has to be run after pulling singularity, otherwise there would be an python errorNo module named os
YAMP is licensed under GNU GPL v3.