One line scripts for bioinformatics

Extracting fasta records

First index the fasta file

samtools faidx mysequences.fasta

Then retrieve sequences by ID

samtools faidx mysequences.fasta id1 id2 id3

A great tool for this is bioawk .

For example to add a fasta compatible prefix like this

>comp12345_c0_seq1
to
>lcl|comp12345_c0_seq1

can be done with the following bioawk command

bioawk -c fastx '{printf ">lcl|%s\n%s\n", $name, $seq}' original.fasta > reformatted.fasta

grep -c ">" mysequences.fasta

for file in *.gz; do tar -xvfz $file;done