RASLseq FASTQ reads to RASLprobe counts.
RASL-seq is a powerful and inexpensive method to assess gene expression without the need for RNA isolation1. We recently published a modified protocol using the RNA ligase Rnl2 which demonstrated dramatically increased ligation efficiency2. This python package offers an alignment method leveraging BLASTn, pandas, py-editdist, and NumPy. Optimizations will follow in the near future.
Public Release RASLseqTools Version 0.2
- Added Normalization Module
- Added Levenshtein Barcode Analysis functions to RASLseqBCannot
Initial public release RASLseqTools, Default Aligner is now STAR (https://github.com/alexdobin/STAR)
Example IPython Notebook: http://nbviewer.ipython.org/github/erscott/RASLseqTools/blob/master/ipynb/RASLseqTools_STAR_example.ipynb
python /path/to/RASLseqAnalysis_STAR.py [required args -f -a -p -w -d -o] [optional args -P -A -n -o5 -o3 -ws -we]
-f : str, absolute path to fastq file(s) (accepts gzip files, comma-separated list if multiple fastqs)
-a : str, absolute path to STAR bin directory
-p : str, absolute path to probes file
-w : str, absolute path to annotations file
-d : str, absolute path to output directory
-o : str, absolute path of output file
-P : bool, verbose printing
-A : bool, Write STAR alignments to disk, will be written in output directory
-n : int, number of jobs, currently requires 2 processors
-o5: int, number of bases to clip from 5-prime end of read to isolate probe sequence, default=24
-o3: int, number of bases to clip from 3-prime end of read to isolate probe sequence, default=22
-ws: int, index position of the wellbarcode start base in read, default=0
-we: int, index position of the wellbarcode end base in read, default=8
example command:
python /path/to/RASLseqAnalysis_STAR.py -f /path/to/your.fastq.gz -a /path/to/STAR_binary/ -p /paht/to/RASL.probes -w /path/to/annotations.bc -d /path/to/write_directory/ -o /path/to/blastdb/write_file.txt -P -A -n 1 -o5 25 -03 20 -ws 0 -we 8
example data: can be found in the data directory
python /path/to/RASLseqAnalysis_BLAST.py [required args -f -s -p -w -d -b -o] [optional args -P]-f : absolute path to fastq file (accepts gzip files)
-p : absolute path to probes file
-w : absolute path to annotations file
-d : absolute path to write directory for blast database
-b : absolute path to blast bin directory
-o : absolute path of output file
-s : specifies sequencer id in fastq index line, e.g. @HISEQ
-P : verbose printing
example command:
python /path/to/RASLseqAnalysis_BLAST.py -f /path/to/your.fastq.gz -s @HISEQ -p /paht/to/RASL.probes -w /path/to/annotations.bc -d /path/to/blastdb/write_dir/ -b /path/to/blast/ncbi-blast-2.2.26+/bin/ -P -o /path/to/output.txtexample data: can be found in the data directory
NOTE:RASLseqAnalysis_NAR.py is provided for transparency and requires manual parameter settings to run
FASTQ: standard FASTQ format (optionally gzipped)
X.probes: tab-separated file describing the RASLseq Probes with the following columns and column headersAcceptorProbeSequence
DonorProbeSequence
AcceptorAdaptorSequence
DonorAdaptorSequence
ProbeName
Please see example file in data/ directory
X.bc: tab-separated file describing each well in the experiment with the following columns and column headers
REQUIRED:
PlateBarcode
WellBarcode
OPTIONAL: additional columns with well metadata, column headers are user defined, e.g. drug_concentration
Please see example file in data/ directory
STAR aligner
BLASTn
pandas
Levenshtein editdist
NumPy
H. Li, J. Qiu, X.-D. Fu, RASL-seq for massive parallel and quantitative analysis of gene expression, Curr. Protocol. Mol. Biol., 98 (2012), pp. 4.13.1–4.13.9
Larman HB, Scott ER, Wogan M, Oliveira G, Torkamani A, Schultz PG, Sensitive, multiplex and direct quantification of RNA sequences using a modified RASL assay, Nucleic Acids Res. 2014;42(14):9146-57