AmpliconSuite/AmpliconSuite-pipeline

b37 reference genome

yliu34 opened this issue · 5 comments

Hello! I have been trying to run the pipeline these days but have not worked it out. I believe the problem was because of the reference genome since the error is "Could not match BAM to a known AA reference genome" even though it did not stop until "Running CNVKit segment". I am wondering if there is a way to use b37 reference genome. Thank you and I am looking forward to your response.

Best regards,
Yang

I also tried transforming bams to fastqs and then using your pipeline with fastqs as input. It did not work either. The error message is as follows:
2023-02-16 21:44:29.484346
PrepareAA version 0.1344.3

Running PrepareAA on sample: test_02162
Running pipeline on /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r1.fastq /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r2.fastq
/home/yliu34/test_02162
Checking for ref index

Performing alignment and sorting
{ bwa mem -K 10000000 -t 4 /home/yliu34/data_repo/hg19/hg19full.fa /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r1.fastq /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r2.fastq | samtools view -Shu - | samtools sort -m 4G -@4 -o /home/yliu34/test_02162.cs.bam -; } 2>/home/yliu34/test_02162_aln_stage.stderr

Performing duplicate removal & indexing
samtools rmdup -s /home/yliu34/test_02162.cs.bam /home/yliu34/test_02162.cs.rmdup.bam

Running samtools index
samtools index /home/yliu34/test_02162.cs.rmdup.bam
Removing temp BAM
/home/yliu34/test_02162.cs.rmdup.bam: 0 + 0 properly paired (N/A : N/A)
Traceback (most recent call last):
File "/home/programs/AmpliconSuite-pipeline-master/PrepareAA.py", line 764, in
check_reference.check_properly_paired(args.sorted_bam)
File "/home/programs/AmpliconSuite-pipeline-master/check_reference.py", line 79, in check_properly_paired
ppp = float(t.rsplit("(")[-1].rsplit("%")[0])
ValueError: could not convert string to float: N/A : N/A)

Hi Yang,

Sorry to hear there have been issues. Can you please provide the header of the original bam file you attempted to use, as well as the header of the the test_02162.cs.rmdup.bam (samtools view -H file.bam)? You can email them to me at jluebeck [at ] eng.ucsd.edu. Additionally can you share what version of samtools you are using? Have you attempted using the dockerized version of AmpliconSuite-pipeline?

Hi Jens,

Thanks for the prompt response. Previously, I used samtools/1.9 to convert bams to fastqs. Today, I tried Picard for the conversion and it is running right now. Hopefully, it will be successful since it has been 2 hours.

Due to institute limitations, I used singularity container to run the docker codes. But it should work similarly to docker.

Hi Yang,

Thank you for sending these files. Can you please check that you have downloaded the GRCh37.tar.gz data repo from
https://datasets.genepattern.org/?prefix=data/module_support_files/AmpliconArchitect/
and placed it in your $AA_DATA_REPO directory? Once in place you should be able to use your original bam file - which is GRCh37-aligned.

Note that the hg19 and GRCh37 data repos cannot be used interchangeably due to the differences in chromosome naming.

Regarding your second error - the N/A/ message using your re-aligned fastq files - this seems perhaps like an error with the unmapping process - reads must be sorted by name before unmapping and remapping. How large was the bam file that resulted from the remapping process? Here is a script for the unmapping process you can use if you decide to continue with that route.

Thanks,
Jens

Hello Jens,

Thank you for your kind help. I used GRCh37 as the reference genome and it works perfectly. You may close the ticket.

Best,
Yang