
rapidly rid reads of horrid humans

Primary LanguagePython


Human DNA where it shouldn't be? Expunge it from your samples with the dehumanizer. Just point at a FASTQ or BAM that you suspect is contaminated with human DNA, and dehumanizer will rifle through your file, throwing your reads at as many aligning processes as you will allow, to yield a clean file, free of uninvited humans.

How does it work?

Reads from your FASTQ or BAM are spewed as quickly as possible into a queue, where a series of hungry minimap2 aligners are waiting. Reads are tested against one (or more) pre-indexed genomic references of your choice, and only enjoy a second life in a new file if there are no hits to anything you don't want to see. Like most things in bioinformatics, the heavy lifting is performed by things Heng Li wrote: minimap2, via mappy, cheers Heng.

How do I run it?

Get yourself a manifest

wget https://sam.s3.climb.ac.uk/dehumanizer/20200421/GCA_000786075.2_hs38d1_genomic.mmi
wget https://sam.s3.climb.ac.uk/dehumanizer/20200421/GCF_000001405.39_GRCh38.p13_genomic.mmi
wget https://sam.s3.climb.ac.uk/dehumanizer/20200421/ipd-imgt-3_39_0.hla_gen.mmi

echo "hs38d1 $(pwd)/GCA_000786075.2_hs38d1_genomic.mmi" >> manifest.txt
echo "GRCh38 $(pwd)/GCF_000001405.39_GRCh38.p13_genomic.mmi" >> manifest.txt
echo "HLA $(pwd)/ipd-imgt-3_39_0.hla_gen.mmi" >> manifest.txt

Go go go

dehumanise <manifest> --fastx <fastq> --preset <minimap2_preset> -o <out.bam> --log <log>
dehumanise <manifest> --bam <bam> --preset <minimap2_preset> -o <out.bam> --log <log>

How do I install it?

pip install git+git@github.com:SamStudio8/dehumanizer.git

This is a work in progress so proceed with caution. Thanks for your continued interest in our products.