Splice Site Search

This is a time-saving/automation tool for searching for potential splice sites in sequence data files (such as SAM or FASTQ files). Given a transcript and a position (or a delimited file containing a set of these per row) it searches the December 2013 archive of the Ensembl Genome Browser for the transcript and gets its cDNA FASTA sequence. The sequence found at the given position is then searched for in the sequence data files, any matching lines are written to a new file.

Installation

This application is written in JavaScript and requires Node.js to be installed in order to run. If Node isn't installed the easiest thing to do is download the binaries for your operating system and place them somewhere in your path.

With Node.js installed you can either clone this repository, or download and extract the ZIP:

$ git clone https://github.com/dsusco/splice-site-search.git

Next, install the module globally with:

$ npm install -g splice-site-search/

If you run into problems here you might need to run the command with sudo. If you don't have sudo access you either install Node.js yourself (placing its bin directory somewhere in your path) or use the program with node splice-site-search/index.js instead.

To confirm that the program is working, run the following to display the help information:

$ splice-site-search -h

Usage

The program can be run in two ways:

Single Transcript and Position

This searches the files given for the sequence found for the given transcript and position.

$ splice-site-search [options] -t <transcript> -p <integer> <files...>

Delimited File of Transcripts and Positions

This searches the files given for the sequence found for each of the given transcripts and positions in the potential splice sites file.

$ splice-site-search [options] -s <file> <files...>

Options

Additional options are described in the command's help information:

$ splice-site-search -h