Y-mer

Predicting human Y chromosome haplogroups from ultra-low depth sequencing data

#COUNTING K-MER FREQUENCIES #GenomeTester4 package contains programs gmer_counter, glistmaker, glistquery https://github.com/bioinfo-ut/Genometester4

With fastq or fasta files:

gmer_counter -dbb k-mers.dbb sample.fastq |cut -f 3 |tail -n +3 > sample_G.counts

With bam or cram files:

samtools fasta sample.bam|gmer_counter -dbb k-mers.dbb - |cut -f 3 |tail -n +3 > sample_G.counts

With having GenomeTester4 based list file, mandatory if using multiple models:

glistquery sample_25.list -f k-mers.txt |cut -f 2 > sample_G.counts

#CALLING HG-s

Calling commandline order is script, model file, sample counts file and R formated output file name

Rscript PREDICTER.R model.Rdata sample_G.counts sample_result.RData > sample_result.txt

#SAMPLES

aDNA sample DA189 fastq reads ERR2505887.fastq.gz aDNA sample DA189 mapped bam file DA189.sort.rmdup.realign.md.bam

assembled chrY NA20509.HIFIRW.ONTUL.na.chrY.fasta

#MODELS

modelfile M213E/M213E_50k.Rdata k-mer dictionary for glistquery M213E/M213E_50k.txt k-mer binary dictionary for gmer_counter M213E/M213E_50k.dbb

#WEB tool

bioinfo-ut/Y-mer