maize-genetics/phg_v2

Error when running build-kmer-index command

Closed this issue · 3 comments

Hello, I was able to go through the build and load module of the pipeline. I have the hVCF files in the output/vcf_files directory. Next step is to build the kmer index. Here is the command -

phg build-kmer-index --db-path data/vcf_dbs/ --hvcf-dir output/vcf_files/

Error -

[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:10,198: processFiles: output/vcf_files/australis_alternate.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:10,973: processFiles: output/vcf_files/australis_primary.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:11,535: processFiles: output/vcf_files/australasica_alternate.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:12,150: processFiles: output/vcf_files/australasica_primary.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:12,646: processFiles: output/vcf_files/inodora_primary.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:13,054: processFiles: output/vcf_files/inodora_alternate.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:13,543: processFiles: output/vcf_files/fallglo_primary.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:14,037: processFiles: output/vcf_files/fallglo_alternate.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:14,496: processFiles: output/vcf_files/fortune_alternate.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:14,950: processFiles: output/vcf_files/wilking_primary.h.vcf.gz
[DefaultDispatcher-worker-1] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:15,437: processFiles: output/vcf_files/wilking_alternate.h.vcf.gz
[main] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:21,789: rangeToSampleToChecksum: 42065 x 11
[main] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:21,789: numOfSamples: 11
[main] INFO net.maizegenetics.phgv2.api.HaplotypeGraph 2024-04-18 11:50:21,789: numOfRanges: 42065
[main] INFO net.maizegenetics.phgv2.pathing.BuildKmerIndex 2024-04-18 11:50:21,810: HaplotypeGraph built in 11637.06965 ms.
[main] INFO net.maizegenetics.phgv2.utils.SeqUtils 2024-04-18 11:50:21,986: queryAgc: Running Agc Command:
conda run -n phgv2-conda agc getctg data/vcf_dbs//assemblies.agc Scaffold_01@australasica_alternate Scaffold_01@australasica_primary Scaffold_01@austr...
[main] ERROR net.maizegenetics.phgv2.utils.SeqUtils 2024-04-18 11:50:22,639: queryAgc: errors found in errorStream: 2
Exception in thread "main" java.lang.IllegalArgumentException: Error running AGC command: conda run -n phgv2-conda agc getctg data/vcf_dbs//assemblies.agc Scaffold_01@australasica_alternate Scaffold_01@australasica_primary Scaffold_01@australis_alternate Scaffold_01@australis_primary Scaffold_01@fallglo_alternate Scaffold_01@fallglo_primary Scaffold_01@fortune_alternate Scaffold_01@inodora_alternate Scaffold_01@inodora_primary Scaffold_01@wilking_alternate Scaffold_01@wilking_primary
Error: [There is no sample:contig pair: australasica_alternate : Scaffold_01, ]
	at net.maizegenetics.phgv2.utils.SeqUtilsKt.queryAgc(SeqUtils.kt:366)
	at net.maizegenetics.phgv2.utils.SeqUtilsKt.retrieveAgcContigs(SeqUtils.kt:96)
	at net.maizegenetics.phgv2.utils.SeqUtilsKt.retrieveAgcContigForSamples(SeqUtils.kt:76)
	at net.maizegenetics.phgv2.pathing.BuildKmerIndex.processGraphKmers(BuildKmerIndex.kt:130)
	at net.maizegenetics.phgv2.pathing.BuildKmerIndex.run(BuildKmerIndex.kt:84)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:279)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:292)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:41)
	at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:457)
	at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:454)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:474)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:481)
	at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:58)

Please let me know your suggestions. Thank you.

This problem is due to a bug introduced in a recent build in code designed to speed up indexing. It should be fixed by the end of the week. I will post here when the new version is available.

The bug has been fixed and tested. The latest release of phg_v2 should fix the problem.

Got the latest release and it works fine now. Thank you.