
A placeholder for all my random scripts. Some scripts might eventually graduate and go to their own projects, so don't be surprised if anything leaves in the future.

Run any script with -h or --help for these help menus.


Converts an alignment to another alignment format.
  /home/lkatz/bin/convertAlignment.pl -i alignmentfile -o outputPrefix [-f outputformat]
  -i the alignment file to input
  -o the output alignment file or directory
    Specify a directory by a trailing slash.
    If a directory, the output format will be determined by -f. Default: fasta.
  -f output format
    possible values are derived from bioperl.
    e.g. fasta, clustalw, phylip, xmfa
  -g input format to force the correct input format, but it's ok to let it guess
  -c to concatenate the alignment into one solid entry
  -l linkerSequence, to be used with -c, to separate entries.
    Default is no linker but a useful linker could be NNNNNNNNNN
  -r remove uninformative sites (Ns, gaps, and non-variable sites)
  -h this helpful menu
   at /home/lkatz/bin/convertAlignment.pl line 21.

Usage: perl /home/lkatz/bin/extractSequence.pl -i inputGenomicFile -s start -e end
  -i input file: the file extension dictates the format
  -o outfile
  -s start coordinate. 1-based
  -e end coordinate. 1-based
  -c contig 
    put a negative sign in front of a contig to indicate revcom
    you may have multiple -c args
  -n name of organism (useful for genome browsers)
   at /home/lkatz/bin/extractSequence.pl line 21.

/home/lkatz/bin/fastacmd.pl: main::main: 
Finds a sequence in a fasta file and prints it  
perl /home/lkatz/bin/fastacmd.pl -s search -d database
  -s search term in the defline.
    search can also be from stdin
  -d fasta file to search
    comma-separated databases if multiples
  -i case-insensitive

Usage: /home/lkatz/bin/fastqToFastaQual.pl -i inputFastqFile [-n numCpus -q outputQualfile -f outputFastaFile] at /home/lkatz/bin/fastqToFastaQual.pl line 25.

Finds the p-distance between two assemblies using mummer. With more genomes, it creates a table.
  Usage: /home/lkatz/bin/genomeDist.pl assembly.fasta assembly2.fasta [assembly3.fasta ...]
  -a to note averages. Switching the subject and query can reveal some artifacts in the algorithm.
  -q for minimal stdout
  -m method.  Can be mummer (default) or jaccard
    Mummer: uses mummer to discover SNPs and counts the total number
    Jaccard: (kmer method) counts 18-mers and calculates 1 - (intersection/union)
  -c minimum kmer coverage. Default: 2
  -k kmer length. Default: 18
   at /home/lkatz/bin/genomeDist.pl line 19.

Converts lasergene sequence files to a multifasta file
  Usage: lasergeneToFna.pl *.lasergene > file.fasta
   at /home/lkatz/bin/lasergeneToFna.pl line 9.

Splits a genbank file from gendb back out into individual loci
  Usage: lyve_splitgbk.pl -i in -o out
  -i input genbank file
  -o output genbank file
    required: .gbk extension
    debug mode
  -l linker
    this help menu
   at /home/lkatz/bin/lyve_splitgbk.pl line 23.

Usage: /home/lkatz/bin/phylovizFilePrep.pl -i infile.fasta -o out.phyloviz
 Usage: /home/lkatz/bin/phylovizFilePrep.pl -i infile.fasta -o out.phyloviz
 where the first sequence is a reference sequence

Allows for better compression of reads by sorting them. Uses smalt under the hood. CGP fast assembly for an assembler.
  Usage: /home/lkatz/bin/sortFastq.pl [-r reference.fasta] < reads.fastq | gzip -c > sorted.fastq.gz
  -r reference.fasta If not supplied, then the reads will be assembled to create one.
  -t tempdir/ Default: one will be created for you
  --numcpus How many cpus to use when mapping. Default: 1
  -p to describe paired-end
   at /home/lkatz/bin/sortFastq.pl line 19.