/samtools-helpers

A few helper scripts for working with samtools

Primary LanguageShell

samtools-helpers

A few helper scripts for working with samtools.

Installation

Put the path to this repo on your $PATH.

echo 'export PATH="$PATH:/path/to/samtools-helpers"' >> ~/.bashrc

For some handy aliases, source .samtools-rc in this repo:

echo 'source /path/to/samtools-helpers/.samtools-rc' >> ~/.bashrc

Usage

The main useful scripts here are samtools-view (alias sv) and variants of it (samtools-view-with-header a.k.a. svh, samtools-view-less a.k.a. svl).

Each of them takes a .sam runs samtools view, and then makes the following improvements:

  • converts the "bit flag" field to 12 0s and 1s
  • formats the file as a table, so e.g. longer vs. shorter read-names in the first column don't mess up the alignment of subsequent columns.

Examples

First 5 non-header lines, using samtools-view:

sv 5 NA12878.sam
20FUKAAXX100202:3:6:15018:84106    000010100011  20  224759  60  101M         =  225025  366   ACCCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA  ?@BBBCEEDFEFEEEFDEEFEEEEBFEDEFCFDDEEFEDFDFEEEFEEEECEEFEEFCEFDEEFFEFEDEEEFFFDECEDCEFEEDDFFBFEFGEAEDCCC  MD:Z:101                     PG:Z:BWA  RG:Z:20FUK.3  AM:i:37  NM:i:0  SM:i:37  MQ:i:60                                                                                                     OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHFHHHGHHHHHIIHHDHHHHHEHHHHH  UQ:i:0
20GAVAAXX100126:8:62:5578:2527     001001010011  20  224759  60  101M         =  224453  -406  ACCCAAAGCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA  834:/,1(:8::8::<98;-(-;>5?08/:;/+7<;=>?@:9>;==<=:<8<>?4>B>AABAAB@@;;<<=>===9>9?=9>=?==;=:;<?>><?@3@;1  MD:Z:7T93                    PG:Z:BWA  RG:Z:20GAV.8  AM:i:25  NM:i:1  SM:i:37  MQ:i:60                                                                                                     OQ:Z:C4541/1.55555555544008??9?1514401555?AAA;5554444555?A?7AFEFFFFFFDF55555444454445555444@5@==5555555555  UQ:i:7
20FUKAAXX100202:4:47:20584:49257   000010100011  20  224761  60  101M         =  225058  387   CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT  ?ACDBBCEDFEDEFEEEFEDBECFBFEFCFDEEEFEDFDFEEEFEEEECEEFEEFCEFFEEFFEFEDEAEFFFAECEFCDFEEFBFFDBEEC:@6A?C4>B  MD:Z:101                     PG:Z:BWA  RG:Z:20FUK.4  AM:i:37  NM:i:0  SM:i:37  MQ:i:60                                                                                                     OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHHDHHHHHIHHHHFHGIGHFE;D9BBD7AH  UQ:i:0
20GAVAAXX100126:7:47:4730:37293    000010100011  20  224761  60  101M         =  225073  412   CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT  ?BB@BCBFDDECC=E@@DB;BDCFDE<<AEB@B>BADD>?C?EDEB>@AC=<?=DAE?E=CAC?;<C=@ADD?ACACCAC>:>4=B676<17@@<:AA<;6  MD:Z:101                     PG:Z:BWA  RG:Z:20GAV.7  AM:i:37  NM:i:0  SM:i:37  MQ:i:60                                                                                                     OQ:Z:BBA>AB@BB@BA?>B==??7>@BBA@:6@@@@@@A@BAA>A?B@BA?=?>9=????@?@>>>@?67@<;??@>?@????@9:96=>2236-39=73@:652  UQ:i:0
20GAVAAXX100126:5:46:21151:39489   000001010011  20  224761  60  101M         =  224465  -396  CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT  >9<=BBB>BB>EFFEEEFEEECEFEEFDEFEEEFFEEFEEFDDEEEEDEEFFDDDDFFFDDFFDEFDEEDFFEEEEEEEEEFEEEEEFFEFEFEF=DED=A  MD:Z:101                     PG:Z:BWA  RG:Z:20GAV.5  AM:i:37  NM:i:0  SM:i:37  MQ:i:60                                                                                                     OQ:Z:DBGGFDFCFFBHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHGH  UQ:i:0

It's still on you to know which of the 12 bits mean what, but it's a lot better than doing the binary conversion in your head!

First 5 non-header lines, using regular samtools view:

$ samtools view NA12878.sam | head -n 5
20FUKAAXX100202:3:6:15018:84106	163	20	224759	60	101M	=	225025	366	ACCCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA	?@BBBCEEDFEFEEEFDEEFEEEEBFEDEFCFDDEEFEDFDFEEEFEEEECEEFEEFCEFDEEFFEFEDEEEFFFDECEDCEFEEDDFFBFEFGEAEDCCC	MD:Z:101	PG:Z:BWA	RG:Z:20FUK.3	AM:i:37	NM:i:0	SM:i:37	MQ:i:60	OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHFHHHGHHHHHIIHHDHHHHHEHHHHH	UQ:i:0
20GAVAAXX100126:8:62:5578:2527	595	20	224759	60	101M	=	224453	-406	ACCCAAAGCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAA	834:/,1(:8::8::<98;-(-;>5?08/:;/+7<;=>?@:9>;==<=:<8<>?4>B>AABAAB@@;;<<=>===9>9?=9>=?==;=:;<?>><?@3@;1	MD:Z:7T93	PG:Z:BWA	RG:Z:20GAV.8	AM:i:25	NM:i:1	SM:i:37	MQ:i:60	OQ:Z:C4541/1.55555555544008??9?1514401555?AAA;5554444555?A?7AFEFFFFFFDF55555444454445555444@5@==5555555555	UQ:i:7
20FUKAAXX100202:4:47:20584:49257	163	20	224761	60	101M	=	225058	387	CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT	?ACDBBCEDFEDEFEEEFEDBECFBFEFCFDEEEFEDFDFEEEFEEEECEEFEEFCEFFEEFFEFEDEAEFFFAECEFCDFEEFBFFDBEEC:@6A?C4>B	MD:Z:101	PG:Z:BWA	RG:Z:20FUK.4	AM:i:37	NM:i:0	SM:i:37	MQ:i:60	OQ:Z:HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHHDHHHHHIHHHHFHGIGHFE;D9BBD7AH	UQ:i:0
20GAVAAXX100126:7:47:4730:37293	163	20	224761	60	101M	=	225073	412	CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT	?BB@BCBFDDECC=E@@DB;BDCFDE<<AEB@B>BADD>?C?EDEB>@AC=<?=DAE?E=CAC?;<C=@ADD?ACACCAC>:>4=B676<17@@<:AA<;6	MD:Z:101	PG:Z:BWA	RG:Z:20GAV.7	AM:i:37	NM:i:0	SM:i:37	MQ:i:60	OQ:Z:BBA>AB@BB@BA?>B==??7>@BBA@:6@@@@@@A@BAA>A?B@BA?=?>9=????@?@>>>@?67@<;??@>?@????@9:96=>2236-39=73@:652	UQ:i:0
20GAVAAXX100126:5:46:21151:39489	83	20	224761	60	101M	=	224465	-396	CCAAATCTAATCAAGGCTCCCACTCTAACTCCCAAGCTCTAGGATATACCAAGGACAAAGGAAGATCATGAAATACCACCATGGGGATTCAATCAGCAAAT	>9<=BBB>BB>EFFEEEFEEECEFEEFDEFEEEFFEEFEEFDDEEEEDEEFFDDDDFFFDDFFDEFDEEDFFEEEEEEEEEFEEEEEFFEFEFEF=DED=A	MD:Z:101	PG:Z:BWA	RG:Z:20GAV.5	AM:i:37	NM:i:0	SM:i:37	MQ:i:60	OQ:Z:DBGGFDFCFFBHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHEHHHGH	UQ:i:0

Note the opaque binary-flag integers in the second field, and the misalignments of some columns.

Entire .sam file without header:

sv NA12878.sam
# or:
samtools-view NA12878.sam

Entire .sam file with header:

svh NA12878.sam
samtools-view-with-header NA12878.sam