/nthash-java

searches for all target k-mers in a long DNA sequence.

Primary LanguageJava

nthash-java

This program is used to search for all possible k-mer substrings in a really long DNA sequence using multi-threading, Rabin-Karp algorithm and rolling hash (using bit rotation).

Description

Genomic information is important for analyses of species and diseases. If there are many specific k-mers in a genome, that might mean something; it might be the emergence of a new gene or it might denote a certain disease. Unfortunately, there is no Java version of this program for such analyses. So, this program--as the name of the program says--efficiently searches all possible k-mers in a DNA sequence in Java.

How to use the program

  1. This program is released to the Central Repository and can be found on search.maven.org.
  2. On search.maven.org, type in "nthash-java", click on the version, and the jar file can be downloaded.
  3. Once downloaded, one can configure the buildpath and add this JAR file on Java IDE such as Eclipse.

Built with

  1. Maven - Dependency Management
  2. H2 - Database Engine
  3. JUnit - Testing Framework
  4. Sonatype - Build and Manage Artifacts

References

  1. ntHash: recursive nucleotide hashing - Provided bit operation idea
  2. Unique Seeds/Values - Provided unique values for each nucleotide A, G, T, and C