/UpPipe

UpPipe is an RNA abundance quantification design on a real processing-near-memory system (UPMEM DPU); the paper of this project is published in Design Automation Conference (DAC) 2023

Primary LanguageC

UpPipe

GitHub repository GitHub top language GitHub commit activity (branch) GitHub last commit (by committer) C++ version g++ version
UpPipe is an RNA abundance quantification design on a real processing-near-memory system (UPMEM DPU); the paper of this project is published in Design Automation Conference (DAC) 2023

Citation

Liang-Chi Chen, Chien-Chung Ho, and Yuan-Hao Chang, “UpPipe: A Novel Pipeline Management on In-Memory Processors for RNA-seq Quantification," ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, July 9-13, 2023.

@inproceedings{chen2023uppipe,
  title={UpPipe: A Novel Pipeline Management on In-Memory Processors for RNA-seq Quantification},
  author={Chen, Liang-Chi and Ho, Chien-Chung and Chang, Yuan-Hao},
  booktitle={2023 60th ACM/IEEE Design Automation Conference (DAC)},
  pages={1--6},
  year={2023},
  organization={IEEE}
}

Materials

Hardware/System Prerequisites

The project has to be run on a system equipped with UPMEM DRAM Processing Units (DPUs), and the kernel system requires installing the UPMEM SDK

Start

git clone https://github.com/chi-0828/UpPipe.git
cd UpPipe
chmod +x build.sh
./build.sh
make -j4

Usage

Allocate transcriptome to DPU(s)

  • KMER SIZE should be 3, 5, ..., 31
  • NUMBER OF DPU(s) in a PIPELINE WORKER should be less than 64 in our suggestion
./UpPipe build \
            -k KMER SIZE  \
            -i OUTPUT INDEX FILE PATH \
            -d NUMBER OF DPU(s) in a PIPELINE WORKER \
            -f TRANSCRIPTOME FILE PATH

Run alignment step for quantification

  • The size of k-mer is already set in INPUT INDEX FILE, this setting cannot be changed in this step
./UpPipe alignment \
            -i INPUT INDEX FILE PATH \
            -r NUMBER OF PIPELINE WORKER(s) \
            -f INPUT RNA READ FILE PATH

Parameters setting (dpu_app/dpu_def.h)

  • KMER SIZE less than 7 may lead to inaccurate mapping result
  • NUMBER OF DPU(s) in a PIPELINE WORKER should be less than 64 for optimal performance
  • The number of transcript / NUMBER OF DPU(s) in a PIPELINE WORKER must be less than 200 (COUNT_LEN in dpu_app/dpu_def.h)
  • Setting READ_LEN to the sequence length of RNA reads
  • Setting WRAM_READ_LEN to the a number which is larger than READ_LEN and divisible by 8
  • WRAM_PREFETCH_SIZE is the size for WRAM pre-feteching, 16 is the optimal size in most situations

Test

  • To build the index file by 11-mer and allocate to 60 DPUs
./UpPipe build \
            -k 11  \
            -i test/test.idx \
            -d 60 \
            -f test/tran.fa
  • To run alignment with 40 pipeline workers
./UpPipe alignment \
            -i test/test.idx \
            -r 40 \
            -f test/read.fa
  • To note that UpPipe shows its efficiency more in the large size dataset due to the porcessing-in-memory features