/ptf-persona

The shell (application) for running all PTF-based Persona applications and experiments, including both local and scale-out.

Primary LanguagePythonApache License 2.0Apache-2.0

What is Persona shell?

  • This contains all of the commands to execute all components of the PTF-Persona application.
  • This is a collection of applications that serve as client, server, and local executor, all wrapped up into this shell command.
  • Uses the PTF-System repository to assemble PTF applications.

How do I run it?

Prerequisites

  • All of the system runs in Docker for ease.
  • If you want to build outside of Docker, replicate the steps in the Dockerfiles manually.
  • Download the PTF system repository linked above
  • Use the [[https://github.com/epfl-dcsl/ptf-system/blob/master/build_container.sh][build container script]] to build the PTF system packaged with the correct name.
  • Further steps will be built on top of the PTF system package with this name.
  • If you want to build PTF Persona outside of the Docker container, you can copy the pip wheel file out of the container. See the PTF System repo for further instructions.

Constructing an AGD Dataset

  • The offline conversion steps for creating an AGD dataset must be done in a prior iteration of the Persona application shell.
  • This prior iteration uses the same underlying Persona library as this system (PTF-Persona), but an older version of applcation conversion.
  • We will use Docker for this step for compatibility. Please replicate these steps (or similar) outside of Docker if you want a native install / build.

Download the offline Persona submodule

First, make sure that you have the code imported as a submodule to this repository by using the git command.

git submodule update --init

Create the container for the Persona shell

  • Build the Docker container that contains the Persona application.
  • This can be skipped if you volume-mount the current directory, but this approach keeps things cleaner.
  • This docker command must be executed in the original_persona directory.
docker build --tag ptf-orig .

Download an example FASTQ file

Start the docker container

docker run --rm -it -v "/path/to/fastq_dataset":/dataset ptf-orig bash

Convert the dataset using the Docker container

  • You can exercise other options for parallelism, chunking (the –chunk option) and the name. See the help option for import_fastq.
# run this in the docker container!
./persona import_fastq --chunk 100000 --name MyFirstAGD --out /dataset/MyFirstAGD.agd /dataset/my_dataset.fastq
  • Now your dataset is in /path/to/fastq_dataset/MyFirstAGD.agd

Running an align-sort application

  • We will run this in the Docker container for this shell.
  • To build this natively, consult the Docker file for the steps for building this.

Build the container

  • Requires the PTF System container to be available.
docker build --tag ptf-shell .

Start the container with the AGD dataset

  • Using the AGD dataset we made in the previous step, we map this location into the docker container and start a normal bash shell.
  • You will also need to map in a location of the table that SNAP aligner uses. See their quickstart guide for how to do this. We will only need a single-end index for this example.
docker run --rm -it -v "/path/to/fastq_dataset/MyFirstAGD.agd":/agd_dataset -v "/path/to/index":/snap_index ptf-shell bash

Run the align-sort application using

./persona local align-sort -d /agd_dataset --fused-index-path /snap_index /agd_dataset/MyFirstAGD.json
  • This research paper on Arxiv describes the architectural components of PTF that are crucial to its scale-out and multi-request capabilities.