Julia implementation of the Needleman-Wunsch pairwise sequence alignment algorithm, along with the Hirschberg space-efficient divide-and-conquer version of the algorithm and a heuristic implementation that approximates the score for the alignment of two sequences.
The project can be loaded into the Julia environment by running
julia --project=.
inside the project root directory. The source code can be exposed and precompiled in the global namespace with using Edist
The project has 3 modules, Full
, Hirschberg
, and Bounded
corresponding to the full dynamic programming implementation,
Hirschberg divide and conquer, and spatially bounded heuristic. For the most part these internal implementations can be ignored
aside from specific parameter tuning.
The main functionality is exposed through the align
and score
functions, which serve as a wrapper around the various submodules
to expose alignment and scoring in an implementation-agnostic way. Both take a module name as the first argument, as well as two strings,
and returns the alignment/score generated by the implementation specified in the module name, e.g.
julia> align(Bounded, "CACTAG", "ATCA")
(score = -4, seq_alignment = "CACTAG", query_alignment = "-A-TCA", memory_used = 376)
score
functions similarly but only returns the score
.
├── data
│ ├── graphics
│ └── TP53_cross_species.fasta
├── docs
├── Manifest.toml
├── nbs
│ └── Analysis.ipynb
├── Project.toml
├── README.md
├── src
│ ├── Bounded.jl
│ ├── Edist.jl
│ ├── Full.jl
│ └── Hirschberg.jl
└── test
data
contains any data sources for the code, in this case a FASTA file containing coding sequences for the TP53 protein across speciesdocs
contains \LaTeX source and/or PDF slide decks papers documenting research and presentation thereofnbs
contains jupyter notebooks for analysis of the projectsrc
contains the project source code