Contains assorted scripts that I used to work with genetic and genomic data.
-
SeqTools.py - collection of functions for dealing with sequence data.
- rev_comp - reverse complements a nucleotide sequence.
- genetic_code - returns dictionary containing the genetic code corresponding to the input number.
- transcribe - replaces Ts in a DNA sequence with Us
- translate - translates a DNA sequence into an amino acid sequence.
- is_orf - checks whether a nucleotide sequence is an unbroken open reading frame.
- count_nts - counts the number of each nucleotide in a sequence.
- gc - calculates % GC content in a nucleotide sequence.
- get_orfs - returns all the unbroken open reading frames above a minimum length within a nucleotide sequence.
-
gff_to_fasta.py - a script that takes in a genome fasta file and a gff file of genes, and outputs a fasta file of the genes in the gff.
-
get_seqs.py - a script that takes in a list of sequences and a fasta file that contains those sequences, and outputs a fasta file containing only the sequences in the list.
-
CodonUsage.py - computes codon usage in a sequence or a set of sequences.
-
PCRLength.py - a script that takes a fasta file as an argument, and prompts the user to supply a gene ID, a forward primer sequence, and a reverse primer sequence, and prints the length of the expected PCR product.
-
fasta_parser.py - for parsing fasta files
-
polyAreads.py - creates a table of information about polyadenylated RNA-Seq reads
-
polyAstats.py - takes table created from polyAreads.py and outputs a new table of 3'UTR stats for each gene in table.