AlgoLab/HapCHAT

Command line arguments - example

Closed this issue · 6 comments

Dear HapCHAT developers,
I would like to try the software out, starting from a VCF file including heterozygous variants and a BAM file storing MinION reads mapped to reference with Minimap2.

This is the command line:
HapCHAT.py --output phased_hapchat.vcf --reference reference_uc.fasta $(realpath $(pwd))/illumina_file.vcf $(realpath $(pwd))/reads.filter.sorted.bam

However I got errors opening (unexisting) file /path/to/working/dir/.reads.filter.sorted.bam.hx_/realigned.wif.merged_e15_m25_t6_n3.wif'.
Any suggestions?
Thanks

Hi,

Thanks for your interest in HapCHAT !

I am not quite sure what the problem might be. Since you have obviously cloned the HapCHAT repository, could you go to the base of this repository (where there is the HapCHAT.py script) and tell me what output you get when you run the following command

./HapCHAT.py --output file.phased.vcf $(realpath $(pwd))/example/file.vcf $(realpath $(pwd))/example/file.bam

Thanks,
Murray

Yes, I git cloned the HapCHAT repository.
The output of the command is:
awk: fatal error: can't open file `/path/to/HapCHAT/example/.file.bam.hx_/raw.wif.merged_e15_m25_t6_n3.wif' for reading (Unexisting file or directory)

Thanks

okay, I have an idea what may have happened, but I need a bit more information. After the above crash, could you tell me what is the content of the following 3 files? (should they exist at all):

/path/to/HapCHAT/example/.file.bam.hx_/raw.wif
/path/to/HapCHAT/example/.file.bam.hx_/raw.wif.log
/path/to/HapCHAT/example/.file.bam.hx_/raw.wif.merged_e15_m25_t6_n3.wif.log

you can post this content here, because no existing file of the above 3 will have more than 31 lines in all cases

Thanks,
Murray

Here it is:

cat raw.wif

9010254 G 0 4 : 9010807 G 0 10 : 9010984 G 0 5 : 9011046 G 0 14 : 9011537 G 0 14 : 9011586 G 0 13 : 9011722 G 0 13 : # N : N
9010254 C 1 12 : 9010570 C 1 4 : 9010807 C 1 9 : # N : N
9010254 C 1 13 : 9010570 C 1 11 : 9010984 G 0 1 : 9011046 G 0 6 : 9011537 C 1 4 : 9011586 C 1 10 : 9011722 C 1 13 : 9012776 C 1 14 : 9013096 G 0 10 : 9013970 C 1 13 : # N : N
9010254 G 0 13 : 9010807 G 0 9 : 9010984 G 0 14 : 9011046 G 0 11 : 9011537 G 0 12 : 9011586 G 0 11 : # N : N

cat raw.wif.log

This is WhatsHap 0.13+52.g7b63430 running under Python 3.6.3
Working on 1 samples from 1 family
WARNING: The maximum coverage is too high! WhatsHap may take a long time to finish and require a huge amount of memory.
======== Working on chromosome '1'
---- Processing individual HG002
Using maximum coverage per sample of 1000X
Number of variants skipped due to missing genotypes: 0
Number of remaining heterozygous variants: 628
Reading alignments for sample 'HG002'and detecting alleles ...
Found 4 reads covering 11 variants
Kept 4 reads that cover at least two variants each
Reducing coverage to at most 1000X by selecting most informative reads ...
Selected 4 reads covering 11 variants
Best-case phasing would result in 1 non-singleton phased blocks (1 in total)
... after read selection: 1 non-singleton phased blocks (1 in total)
Variants covered by at least one phase-informative read in at least one individual after read selection: 11
Adding reads to be used for phasing in wif format to running dictionary
======== Writing VCF
Done writing VCF
Writing running dictionary of reads in wif format to '/home/simone/software/HapCHAT/example/.file.bam.hx_/raw.wif'

== SUMMARY ==
Maximum memory usage: 0.028 GB
Time spent reading BAM:                         0.1 s
Time spent parsing VCF:                         0.1 s
Time spent selecting reads:                     0.0 s
Time spent phasing:                             0.0 s
Time spent writing VCF:                         0.1 s
Time spent finding components:                  0.0 s
Time spent on rest:                             0.1 s
Total elapsed time:                             0.4 s

cat raw.wif.merged_e15_m25_t6_n3.wif.log

Traceback (most recent call last):
  File "/home/simone/software/HapCHAT/scripts/rb-merge.py", line 3, in <module>
    import networkx as nx
ModuleNotFoundError: No module named 'networkx'

So, I pip installed networkx. Now the last file contains:

cat raw.wif.merged_e15_m25_t6_n3.wif.log

INFO     [180727 125750]  Program started.
INFO     [180727 125750]  Error Rate: 0.15
INFO     [180727 125750]  Max Error Rate: 0.25
INFO     [180727 125750]  Positive Threshold: 1000000
INFO     [180727 125750]  Negative Threshold: 1000
INFO     [180727 125750]  Started reading WIF file...
INFO     [180727 125750]  Finished reading WIF file.
INFO     [180727 125750]  N. WIF entries: 4
INFO     [180727 125750]  Blue Graph
INFO     [180727 125750]  Nodes: 4 - Edges: 1 - ConnComp: 3
INFO     [180727 125750]  Non-Blue Graph
INFO     [180727 125750]  Nodes: 4 - Edges: 2 - ConnComp: 2
INFO     [180727 125750]  Red Graph
INFO     [180727 125750]  Nodes: 4 - Edges: 0 - ConnComp: 4
INFO     [180727 125750]  Non-Red Graph
INFO     [180727 125750]  Nodes: 4 - Edges: 1 - ConnComp: 3
INFO     [180727 125750]  Started Merging Reads...
INFO     [180727 125750]  Printing graph in /home/simone/software/HapCHAT/example/.file.bam.hx_/raw.wif.merged_e15_m25_t6_n3.wif.graph file
INFO     [180727 125750]  Finished Merging Reads.
INFO     [180727 125750]  Program Finshed

So, I think the problem is solved!
Thanks

Just a further clarification. A part from working on example data, when I ran HapCHAT on my data, it produced an empty phased vcf file. I realised it was a "read group" problem. Since I'm using aligner minimap2 and didn't specify the -R option, I didn't have the @rg line in the bam file. When rerunning Minimap2 with the -R option it worked fine. Since I was used to --ignore-read-groups option of Whatshap, I was not used to adding the -R tag in single-sample BAM file.
Thanks

Ok, great --- good to see that you worked out the networkx problem, maybe I can add a note to help users debug such problems in the future

as for the --ignore-read-groups option, we can have this as well, it is just that the whatshap interface is much more sophisticated for handling standard formats, dealing with these issues, etc.

in fact, we are currently planning to integrate HapCHAT into the whatshap code base precisely for this reason (even this networkx problem is a result of a somewhat awkward interface). You can track this development here if you are curious: https://bitbucket.org/whatshap/whatshap/branch/hapchat-integration

once the integration is done, we will surely make an announcement on this repository, our webpage, etc.

in the meantime, happy phasing !

Best,
Murray