/ilrfilter

Immunoglobulin-like Read Filter

Primary LanguageJava

Pre-filtering of immunoglobulin-like reads

Speed up TCR/BCR mapping for large FASTQ files like the ones coming from RNA-Seq experiments. Inspired by Vidjil algorithm.

You can get latest JAR from releases section, required Java 1.8+ to run. Run as

java -jar ilrfilter-0.0.1.jar hash -S hsa -I reads_R1.fastq reads_R2.fastq -O out_prefix

To see the list of available options run either

java -jar ilrfilter-0.0.1.jar hash

for hashmap-based (kmer) algorithm or

java -jar ilrfilter-0.0.1.jar tree

for tree-based algorithm. To compile and check clone the repo and run test.sh in examples/ folder.

Tree-based algorithm is slower but takes less memory and startup time than hash-based.

Note that we found that using a K-mer of length 15 with 1 mismatch (default parameters for hash-based algorithm) allows reducing data size ~10 to 50-fold while having a false-negative rate < 0.1%. Selecting longer K-mers or more mismatches for hash-based algorhtm, or using several substitutions and indels for tree-based algorithm can significantly increase running time/memory requirements and lead to no filtering of input file.

Also note that this implementation uses Java Stream API from 1.8 so it will use all available cores by default.