/RASLseqTools

RASLseq FASTQ to RASLprobe counts.

Primary LanguageJupyter NotebookMIT LicenseMIT

RASLseqTools

RASLseq FASTQ reads to RASLprobe counts.

RASL-seq is a powerful and inexpensive method to assess gene expression without the need for RNA isolation1. We recently published a modified protocol using the RNA ligase Rnl2 which demonstrated dramatically increased ligation efficiency2. This python package offers an alignment method leveraging BLASTn, pandas, py-editdist, and NumPy. Optimizations will follow in the near future.



20 August 2015


Public Release RASLseqTools Version 0.2

    Added Normalization Module

    Added Levenshtein Barcode Analysis functions to RASLseqBCannot


19 April 2015


Initial public release RASLseqTools, Default Aligner is now STAR (https://github.com/alexdobin/STAR)

Example IPython Notebook: http://nbviewer.ipython.org/github/erscott/RASLseqTools/blob/master/ipynb/RASLseqTools_STAR_example.ipynb

STAR Usage

python /path/to/RASLseqAnalysis_STAR.py [required args -f -a -p -w -d -o] [optional args -P -A -n -o5 -o3 -ws -we]

-f : str, absolute path to fastq file(s) (accepts gzip files, comma-separated list if multiple fastqs)

-a : str, absolute path to STAR bin directory

-p : str, absolute path to probes file

-w : str, absolute path to annotations file

-d : str, absolute path to output directory

-o : str, absolute path of output file


-P : bool, verbose printing

-A : bool, Write STAR alignments to disk, will be written in output directory

-n : int, number of jobs, currently requires 2 processors

-o5: int, number of bases to clip from 5-prime end of read to isolate probe sequence, default=24

-o3: int, number of bases to clip from 3-prime end of read to isolate probe sequence, default=22

-ws: int, index position of the wellbarcode start base in read, default=0

-we: int, index position of the wellbarcode end base in read, default=8


example command:

python /path/to/RASLseqAnalysis_STAR.py -f /path/to/your.fastq.gz -a /path/to/STAR_binary/ -p /paht/to/RASL.probes -w /path/to/annotations.bc -d /path/to/write_directory/ -o /path/to/blastdb/write_file.txt -P -A -n 1 -o5 25 -03 20 -ws 0 -we 8

example data: can be found in the data directory


BLASTn Usage

python /path/to/RASLseqAnalysis_BLAST.py [required args -f -s -p -w -d -b -o] [optional args -P]

-f : absolute path to fastq file (accepts gzip files)

-p : absolute path to probes file

-w : absolute path to annotations file

-d : absolute path to write directory for blast database

-b : absolute path to blast bin directory

-o : absolute path of output file

-s : specifies sequencer id in fastq index line, e.g. @HISEQ


-P : verbose printing


example command:
python /path/to/RASLseqAnalysis_BLAST.py -f /path/to/your.fastq.gz -s @HISEQ -p /paht/to/RASL.probes -w /path/to/annotations.bc -d /path/to/blastdb/write_dir/ -b /path/to/blast/ncbi-blast-2.2.26+/bin/ -P -o /path/to/output.txt

example data: can be found in the data directory

NOTE:RASLseqAnalysis_NAR.py is provided for transparency and requires manual parameter settings to run


Input File Formats

FASTQ: standard FASTQ format (optionally gzipped)

X.probes: tab-separated file describing the RASLseq Probes with the following columns and column headers

AcceptorProbeSequence
DonorProbeSequence
AcceptorAdaptorSequence
DonorAdaptorSequence
ProbeName

Please see example file in data/ directory

X.bc: tab-separated file describing each well in the experiment with the following columns and column headers
REQUIRED:
PlateBarcode
WellBarcode
OPTIONAL: additional columns with well metadata, column headers are user defined, e.g. drug_concentration
Please see example file in data/ directory


Dependencies

    STAR aligner
    BLASTn
    pandas
    Levenshtein editdist
    NumPy


References

  1. H. Li, J. Qiu, X.-D. Fu, RASL-seq for massive parallel and quantitative analysis of gene expression, Curr. Protocol. Mol. Biol., 98 (2012), pp. 4.13.1–4.13.9

  2. Larman HB, Scott ER, Wogan M, Oliveira G, Torkamani A, Schultz PG, Sensitive, multiplex and direct quantification of RNA sequences using a modified RASL assay, Nucleic Acids Res. 2014;42(14):9146-57