/aim

AIM framework for high-throughput pairwise sequence alignment using processing-in-memory

Primary LanguageCMIT LicenseMIT

AIM: Alignment-in-Memory Framework for High-throughput Sequence Alignment using Processing-in-Memory

AIM is a framework for high-throughput pairwise sequence alignment using processing-in-memory. It targets UPMEM, the first publicly-available general-purpose programmable PIM architecture. AIM dispatches a large number of sequence pairs across different memory modules and aligns each pair using compute cores within the memory module where it resides. It supports multiple alignment algorithms including NW, SWG, GenASM [1], WFA [2], and WFA- adaptive, and includes two implementations of each algorithm that manage the UPMEM memory hierarchy differently and are suitable for different read lengths.

We evaluate AIM on a real UPMEM system and compare the throughput it can achieve with that achieved by server-grade multi-threaded CPU systems running at full scale. Our evaluation shows that a real PIM system can substantially outperform CPU systems for a wide variety of algorithms, read lengths, and edit distance thresholds. For example, for WFA-adaptive, the state-of-the-art sequence alignment algorithm, AIM achieves a speedup of up to 2.56x when data transfer time is included, and up to 28.14x when data transfer time is not included.

Citation

For more information on this project you can refer to the following papers:

Long paper:

Coming soon...

HiCOMB22 short paper (slides):

S. Diab, A. Nassereldine, M. Alser, J. G. Luna, O. Mutlu, and I. E. Hajj, "High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory." arXiv, 2022. doi: 10.48550/ARXIV.2204.02085.

If you find this project useful, please cite these papers.

Repository Structure

Each folder of this repository has the PIM implementations relative to each alignment algorithm. For each alignment algorithm, we provide two PIM implementations: (1) DPU-WRAM implementation stores the alignment data in the WRAM, and (2) DPU-MRAM implementation stores the alignment data in the MRAM and uses the WRAM as cache memory.

GenASM's PIM implementations are found in this submodule of AIM framework https://github.com/safaad/aim-genasm

├───Datasets
├───NW
│   ├───DPU-MRAM
│   └───DPU-WRAM
├───SWG
│   ├───DPU-MRAM
│   └───DPU-WRAM
└───WFA
    ├───DPU-MRAM
    └───DPU-WRAM

Instructions

Prerequisites

AIM is designed to run on a server with a real UPMEM-PIM modules. However, the functional simulator included in the UPMEM SDK can be also used to run the implementations. The used SDK version in AIM is 2021.3.0.

Getting Started

Clone the repository:

git clone https://github.com/safaad/aim
cd aim

Running AIM

AIM provides a python script to run each PIM implementation run_*_pim.py. The script estimates the upper limit of the used WRAM and MRAM memory in order to determine the number of DPU threads needed NR_TASKLETS.It takes as an input features of the read pairs dataset such as read length, edit distance threshold, and the number of sequence pairs to align. Below is an example of running WFA MRAM implementation (same method can be used for all versions) while enabling backtracing on a dataset with sequence length 100bp, ED=1%, and 40K sequence pairs:

cd WFA/DPU-MRAM

# Script usage
> python run-wfa-pim-mram.py
> usage: run-wfa-pim-mram.py [-h] -i INPUT [-o OUTPUT] -l READ_LENGTH -e ERROR
                           -n NUMBER_READS [-m MATCH_COST] [-x MISMATCH_COST]
                           [-g GAP_OPENING] [-a GAP_EXTENDING] [-b] [-r]
                           [-t NR_OF_TASKLETS] [-d NR_OF_DPUS]
                           
# Example of running WFA MRAM implementation
python run-wfa-pim-mram.py -i ../../Datasets/sample-l100-e1-40K.01 -l 100 -e 0.01 -n 40000 -b -d 2500

Note that this script uses heuristics to estimate the upper limit of the used WRAM and MRAM memory, so it might underestimate the NR_TASKLETS to avoid running out of memory.

Or, the user can compile and run these implementations manually but needs to determine the NR_TASKLETS and the WRAM_SEGMENT size assigned to each thread if the implementation uses the custom dynamic memory allocator (WFA and GenASM). Also, the sequence length, max alignment score, and the number of sequence pairs should be added at the compile and run time. For example:

cd WFA/DPU-MRAM

# Compile example
make NR_TASKLETS=19 NR_DPUS=2500 FLAGS="-DMAX_SCORE=25 -DREAD_SIZE=112 -DBACKTRACE -DWRAM_SEGMENT=2122"

# Run Example
./build/host ../../Datasets/sample-l100-e1-40K.01 ./out 40000

Each line of the output file will contain the number of the aligned read-reference pair, the alignment score (edit distance in case of GenASM), and the CIGAR string if the backtracing is enabled.

Contact

For further questions and suggestions, feel free to reach out syd04@aub.edu.lb

References