/duplicates

Analysis of duplicate reads in Illumina sequencing

Primary LanguageGroovyApache License 2.0Apache-2.0

duplicates

Analysis of duplicate reads in Illumina sequencing using unique molecular identifiers (UMIs)

alt text

How to use

  • Install Java/Groovy and R, call install.packages("ggplot2") in R to resolve dependency. Groovy dependency (Gpars) will be automatically resolved via Grape, additional configuration is only required if youre behind firewall/using proxy

  • Clone the repository with git clone <repo_url>

  • Make sure your fastq files were prepared using MiGEC/Checkout utility with -u option that stores UMI tags in read headers

  • Change dir to the folder containing .fastq[.gz] files you wish to analyze and run $bash path/to/duplicates/run.sh (only R1 will be considered)

  • Enjoy fancy pdf output