stschiff/msmc2

issues with scaffold-level data

selasphoruskershaw opened this issue · 0 comments

Hello,

I am having issues converting the name of a scaffold to a chromosome name.

Using the script below as a template:
image

(here's what I input into command line): samtools mpileup -B -q 20 -Q 20 -C 50 -g -r QRBIO1000092.1 -f barn.fna s2907.bam | bcftools call -c -V indels | ~/Swallows/msmc2/msmc-tools/bamCaller.py 12 out.mask.QRBIO1000181.1.bed.gz | gzip -c > out.QRBIO1000181.1.vcf.gz

I get this result: [E::mpileup] fail to parse region 'QRBIO1000092.1' with s2907.bam
Failed to read from standard input: unknown file type

Thus, I have been trying to convert scaffold QRBIO1000092.1 to chr1 manually to then input into the above script. Examples I've found online to change seem to work for more simple changes, i.e., changing 1 to chr1, but not for my scaffold-level data. For example, I tried the following. It appeared to go through, but when I checked the new .bam file, the original scaffold name remained.

samtools view -h s2907.bam |sed -e 's/SN:QRBIO1000092.1/SN:chr1/' | samtools reheader - s2907.bam > test_s2907.bam

I need this to get going because it looks like the msmc2 program requires chromosome (and scaffold) files to be processed one by one. Samtools doesn't seem to like scaffold names.

Thanks in advance for your help!