Infer read-group information from read names in SAM or FASTQ file.
make
make install
usage: rgsam [command]
commands:
collect collect read-group information from SAM or FASTQ file
split split SAM or FASTQ file based on read-group
tag tag reads in SAM file with read-group field
qnames list supported read name formats
version print version
Read-group identifier (ID
) and platform unit (PU
) are inferred from read
names according to supported read name formats:
{
"illumina-1.0": {
"format": "@{flowcell}-{instrument}:{lane}:{tile}:{x}:{y}#{sample}/{pair}",
"example": "@HWUSI-EAS100R:6:73:941:1973#0/1"
},
"illumina-1.8": {
"format": "@{instrument}:{run}:{flowcell}:{lane}:{tile}:{x}:{y}",
"example": "@EAS139:136:FC706VJ:2:2104:15343:197393"
},
"broad-1.0": {
"format": "@{flowcell,5}:{barcode}:{lane}:{tile}:{x}:{y}",
"example": "@H0164ALXX140820:2:1101:10003:23460"
}
}
Platform (PL
) defaults to illumina
.
Sample (SM
) and library identifier (LB
) may be inferred from input file name.
Files with reads from more than one sample or library are not supported.
To split BAM or SAM files containing proper @RG
header lines and reads tagged
with read-group field (e.g. RG:Z:H1
), use instead:
samtools view -r <rg_id> <in.bam>
Suppose we have a BAM file with no read-group data, then we first infer the set of read-groups by
samtools view sample.bam | rgsam collect -s sample -o rg.txt
Now we can tag the reads with read-group information (any existing read-group tags will be replaced).
samtools view -h sample.bam |
rgsam tag -r rg.txt |
samtools view -b - > sample.rg.bam
Note that we use the -h
flag of samtools view
to ensure that other header data
are preserved (any existing @RG
will be replaced).