nthash-java
This program is used to search for all possible k-mer substrings in a really long DNA sequence using multi-threading, Rabin-Karp algorithm and rolling hash (using bit rotation).
Description
Genomic information is important for analyses of species and diseases. If there are many specific k-mers in a genome, that might mean something; it might be the emergence of a new gene or it might denote a certain disease. Unfortunately, there is no Java version of this program for such analyses. So, this program--as the name of the program says--efficiently searches all possible k-mers in a DNA sequence in Java.
How to use the program
- This program is released to the Central Repository and can be found on search.maven.org.
- On search.maven.org, type in "nthash-java", click on the version, and the jar file can be downloaded.
- Once downloaded, one can configure the buildpath and add this JAR file on Java IDE such as Eclipse.
Built with
- Maven - Dependency Management
- H2 - Database Engine
- JUnit - Testing Framework
- Sonatype - Build and Manage Artifacts
References
- ntHash: recursive nucleotide hashing - Provided bit operation idea
- Unique Seeds/Values - Provided unique values for each nucleotide A, G, T, and C