GPU-accelerated guppy basecalling

Oxford Nanopore Sequencing

Introduction

The usage of Graphics Processing Unit (GPU) in high performance computing (HPC) has made a paradigm shift in HPC capabilities. At the heart of this performance is the parallel data processing architecture of the GPUs. In brief, a traditional CPU (Central Processing Unit) executes a task sequentially using its 4-8 cores. Unlike CPUs, a GPU breaks a task into smaller components and executes them in a parallel fashion using its thousands of GPU cores. As such, the GPU-powered compute nodes can take on highly data-intensive workloads and accomplish them in an unprecedented speed.

The GPU-accelerated guppy basecalling is such an example of GPU applications in data-intensive computing. The Oxford Nanopore Technologies (ONT) sequencing platforms generate electronic signals as the raw sequencing data. These signals are converted to the actual DNA/RNA sequences through a process called ‘basecalling’. And guppy is a widely used basecalling algorithm for the ONT sequencing data. However, guppy takes days to complete this basecalling process when it runs on CPU-only computers. On the other hand, GPU-accelerated guppy can accomplish the similar task in hours.

Advantages

The GPU-accelerated guppy basecalling enables faster access to the results. This is crucial when the ONT sequencing platforms are used as a disease diagnostic or outbreak surveillance tool requiring quick decision-making.
Re-basecalling of the previous sequencing data is made easy with GPU- accelerated guppy basecalling

Requirements

This tutorial has been written for Ubuntu 18.04 Operating System on Linux.

NVIDIA GPU device. GPU-accelerated guppy is built around the NVIDIA GPU device. Therefore, your computer must have one or more NVIDIA GPUs
NVIDIA GPU device driver. Install the NVIDIA GPU device driver. This also automatically installs 'nvidia-smi’, a utility tool to monitor the GPU device performance. ‘smi’ stands for ‘system management interface’
CUDA (Compute Unified Device Architecture). You must have ‘CUDA’ installed. CUDA is a programming language. It has been used to write the NVIDIA GPU kernel (set of instructions) as well as the codes of the GPU-capable application that will be run on the NVIDIA GPU devices. Therefore, CUDA acts as a bridge between the application (GPU-enabled guppy in this case) and the NVIDIA GPU devices. CUDA issues and manages workloads on the NVIDIA GPU devices. In other words, it orchestrates the parallel computing in the NVIDIA GPUs. Furthermore, CUDA can also be used in other languages such as C, C++, Fortran, Python and MATLAB to develop softwares featuring parallel computing
GPU-enabled guppy basecaller.
- Download the GPU-enabled guppy software from Nanopore community (https://community.nanoporetech.com/downloads)
- Or, use ‘wget’. For example: ‘ont-guppy_6.0.1_linux64.tar.gz’ for Linux can be downloaded as ‘wget https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy_6.0.1_linux64.tar.gz’
- Then, tar -zvxf ont-guppy_6.0.1_linux64.tar.gz
- cd ont-guppy/bin
- run ./guppy_basecaller -v
- If the software is installed successfully, then it will display something like as follows: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 6.0.1+652ffd1

Adding guppy to PATH variable so that it can be run in the terminal without specifying a path

Again, cd to ont-guppy/bin
Then, mv guppy* /bin
‘/bin’ is the directory where ‘cat’ ‘sed’ ‘chmod’ etc commands are sitting. Any command sitting in this path, can be executed in the terminal without specifying a path before the command. This is because ‘/bin’ path is included in the PATH variable. To see, echo $PATH