Find unique values between two files (see https://stackoverflow.com/questions/4717250/extracting-unique-values-between-2-sets-files).
awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2
Convert fasta headers to numbers (see https://www.biostars.org/p/53212/).
awk '/^>/{print ">" ++i; next}{print}' file.fasta > file.header_mod.fasta
Convert fasta headers to numbers w/a prefix, 'chromosome' here (see https://www.biostars.org/p/53212/).
awk '/^>/{print ">chromosome" ++i; next}{print}' < file.fasta
Remove vcf header (see https://www.biostars.org/p/49660/)
awk '! /\#/' variants.VCF > no_header.VCF
Get the average of a column, here column 2 (see https://stackoverflow.com/questions/19149731/use-awk-to-find-average-of-a-column)
awk -v N=2 '{ sum += $N } END { if (NR > 0) print sum / NR }'
Fix paired reads that are no longer properly sorted (see https://www.biostars.org/p/59707/)
mkfifo tmp
awk 'NR%4==1{n=$1}NR%4==2{s=$1}NR%4==0{print n,s,$1}' r1.fq | sort -S 2G > tmp &
awk 'NR%4==1{n=$1}NR%4==2{s=$1}NR%4==0{print n,s,$1}' r2.fq | sort -S 2G | join -a1 -a2 tmp - | awk 'NF==5{print $1"\n"$2"\n+\n"$3 >"x1.fq";print $1"\n"$4"\n+\n"$5 >"x2.fq"}NF==3{print $1"\n"$2"\n+\n"$3>"orphan.fq"}'
Add text at the beginning of lines 1-10, inplace (see https://stackoverflow.com/questions/9533679/how-to-insert-a-text-at-the-beginning-of-a-file)
sed -i '1,10s/^/<added text> /' file
sed -i '1s/^/<added text> \n/' file
Transpose columns and rows in a text file (see https://unix.stackexchange.com/questions/169995/rows-to-column-conversion-of-file)
awk '{ for (i=1; i<=NF; i++) RtoC[i]= (RtoC[i]!=""? RtoC[i] FS $i: $i) }
END{ for (i in RtoC) print RtoC[i] }' infile
Merge two files by column and select a subset of columns in resultant file (see https://theglassicon.com/tip-of-the-day/awk-script-merge-columns-files/)
Here there are two files, both two columns, and we merge column two from the first file with column one from the second.
pr -m -t -s\ file1 file2 | awk '{print $2,$3}' > out_file.txt
subset file by specific row number (see https://stackoverflow.com/questions/6491532/how-to-subset-a-file-select-a-numbers-of-rows-or-columns)
cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'
split multi-fasta file into individual fasta files, each named after the sequence from the fasta (see https://gist.github.com/astatham/621901)
cat hg18.fa | awk '{
if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fa")}
print $0 > filename
}'
remove pipe and everythign after from fasta header (see https://stackoverflow.com/questions/18602234/sed-to-remove-everything-after-in-file-using-command)
sed -i 's/[|].*$//' some.fasta