Ragtag.py merge RuntimeError: only complete components can be added to the graph error
benyoung93 opened this issue · 6 comments
Good morning
I have been trouble shooting this for a day to no avail, and as such am posting this question/query about my error message.
Command
ragtag.py merge ../../hifi_assemblay/all_contam_rem/Ofav_hifiasm_allcontrem.fa \
../../longstitch_new/Ofav_hifiasm_allcontrem.fa.k32.w100.z1000.trimmed_scafs.agp \
../../ragtag/ofav_scaffold/ragtag.scaffold.agp \
-u
I have checked that my agp files are correct using the inbuilt tool, and they are.
ragtag.py agpcheck ../../longstitch_new/Ofav_hifiasm_allcontrem.fa.k32.w100.z1000.trimmed_scafs.agp ../../ragtag/ofav_scaffold/ragtag.scaffold.agp
DISCLAIMER:
This utility performs most (but not all) checks necessary to validate an
AGP v2.1 file: https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/
Please additionally use the NCBI AGP validator for robust
validation: https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Validation/
Fri Mar 10 10:23:37 2023 --- INFO: Checking /scratch/projects/omics/ofav_genome/longstitch_new/Ofav_hifiasm_allcontrem.fa.k32.w100.z1000.trimmed_scafs.agp ...
Fri Mar 10 10:23:37 2023 --- INFO: Check for /scratch/projects/omics/ofav_genome/longstitch_new/Ofav_hifiasm_allcontrem.fa.k32.w100.z1000.trimmed_scafs.agp is complete with no errors.
Fri Mar 10 10:23:37 2023 --- INFO: Checking /scratch/projects/omics/ofav_genome/ragtag/ofav_scaffold/ragtag.scaffold.agp ...
Fri Mar 10 10:23:37 2023 --- INFO: Check for /scratch/projects/omics/ofav_genome/ragtag/ofav_scaffold/ragtag.scaffold.agp is complete with no errors.
Interestingly, when using the NCBI validation tool I get warnings for the contigs that have not (i think) been scaffolded
19: | ptg000001l 1 32039455 1 W ptg000001l 1 32039455 +
-- | --
| object name (column 1) is the same as component_id (column 6)
20: | ptg000003l 1 35678035 1 W ptg000003l 1 35678035 +
| object name (column 1) is the same as component_id (column 6)
21: | ptg000004l 1 33295526 1 W ptg000004l 1 33295526 +
| object name (column 1) is the same as component_id (column 6)
22: | ptg000005l 1 36602845 1 W ptg000005l 1 36602845 +
| object name (column 1) is the same as component_id (column 6)
23: | ptg000006l 1 40246328 1 W ptg000006l 1 40246328 +
| object name (column 1) is the same as component_id (column 6)
24: | ptg000007l 1 24061036 1 W ptg000007l 1 24061036 +
| object name (column 1) is the same as component_id (column 6)
25: | ptg000008l 1 34276962 1 W ptg000008l 1 34276962 +
| object name (column 1) is the same as component_id (column 6)
26: | ptg000009l 1 24390148 1 W ptg000009l 1 24390148 +
<div style="padding: 0px; margin: 0.5em 0px 0px; color: rgb(0, 0, 0); font-family: Arial, Helvetica, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><span style="padding: 0px; margin: 0px; font-size: 13.2px; font-weight: bold;">Statistics</span><span> </span> <span> </span><a href="https://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_validate.cgi?#top" style="padding: 0px; margin: 0px; text-decoration: none; border: none; color: rgb(0, 0, 102); font-size: 10.8px; font-weight: bold;">back to top↑</a></div>
Objects:242- with single component:235 Scaffolds:242- with single component:235 | Objects: | 242 | - with single component: | 235 | | Scaffolds: | 242 | - with single component: | 235 | Object names:242 ptg[000001..000249]l:229 ntLink_[0..6]:7 ptg[000015..000097]c:6 | Object names: | 242 | ptg[000001..000249]l: | 229 | ntLink_[0..6]: | 7 | ptg[000015..000097]c: | 6
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Objects: | 242
- with single component: | 235
Scaffolds: | 242
- with single component: | 235
Object names: | 242
ptg[000001..000249]l: | 229
ntLink_[0..6]: | 7
ptg[000015..000097]c: | 6
Components (W):250 orientation +:244 orientation -:6 orientation ? (formerly 0):0 orientation na:0 | Components (W): | 250 | orientation +: | 244 | orientation -: | 6 | orientation ? (formerly 0): | 0 | orientation na: | 0 | Component names:250 ptg[000001..000250]l:244 ptg[000015..000097]c:6 | Component names: | 250 | ptg[000001..000250]l: | 244 | ptg[000015..000097]c: | 6
Components (W): | 250
orientation +: | 244
orientation -: | 6
orientation ? (formerly 0): | 0
orientation na: | 0
Component names: | 250
ptg[000001..000250]l: | 244
ptg[000015..000097]c: | 6
Gaps (N):3- do not break scaffold:3 scaffold, linkage yes:3 | Gaps (N): | 3 | - do not break scaffold: | 3 | scaffold, linkage yes: | 3 | Linkage evidence: paired-ends:3 | Linkage evidence: | | paired-ends: | 3
Gaps (N): | 3
- do not break scaffold: | 3
scaffold, linkage yes: | 3
Linkage evidence: |
paired-ends: | 3
<br class="Apple-interchange-newline">Statistics [back to top↑](https://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_validate.cgi?#top)
Objects: 242
- with single component: 235
Scaffolds: 242
- with single component: 235
Object names: 242
ptg[000001..000249]l: 229
ntLink_[0..6]: 7
ptg[000015..000097]c: 6
Components (W): 250
orientation +: 244
orientation -: 6
orientation ? (formerly 0): 0
orientation na: 0
Component names: 250
ptg[000001..000250]l: 244
ptg[000015..000097]c: 6
Gaps (N): 3
- do not break scaffold: 3
scaffold, linkage yes: 3
Linkage evidence:
paired-ends: 3
Would i need to remove these lines with similar information, and only keep in the relevant scaffolded lines (i.e. the top lines as shown below). Or is that a bad idea/100% wrong?
Output from ragtag.py merge
Fri Mar 10 10:20:27 2023 --- VERSION: RagTag v2.1.0
Fri Mar 10 10:20:27 2023 --- WARNING: This is a beta version of `ragtag merge`
Fri Mar 10 10:20:27 2023 --- CMD: ragtag.py merge ../../hifi_assemblay/all_contam_rem/Ofav_hifiasm_allcontrem.fa ../../longstitch_new/Ofav_hifiasm_allcontrem.fa.k32.w100.z1000.trimmed_scafs.agp ../../ragtag/ofav_scaffold/ragtag.scaffold.agp -u
Fri Mar 10 10:20:27 2023 --- INFO: Building the scaffold graph from the AGP files
Traceback (most recent call last):
File "/nethome/bdy8/miniconda3/envs/ragtag_env/bin/ragtag_merge.py", line 430, in <module>
main()
File "/nethome/bdy8/miniconda3/envs/ragtag_env/bin/ragtag_merge.py", line 362, in main
agp_multi_sg.add_agps(agp_list, in_weights=weight_list, exclusion_set=comp_exclusion_set)
File "/nethome/bdy8/miniconda3/envs/ragtag_env/lib/python3.7/site-packages/ragtag_utilities/ScaffoldGraph.py", line 606, in add_agps
for ap in self._get_assembly_points(agp, weight):
File "/nethome/bdy8/miniconda3/envs/ragtag_env/lib/python3.7/site-packages/ragtag_utilities/ScaffoldGraph.py", line 518, in _get_assembly_points
raise RuntimeError("only complete components can be added to the graph.")
RuntimeError: only complete components can be added to the graph.
Please also find some additional helpful information from my input files (fasta and the two agps).
head -20 /scratch/projects/omics/ofav_genome/longstitch_new/Ofav_hifiasm_allcontrem.fa.k32.w100.z1000.trimmed_scafs.agp
ntLink_0 1 3452964 1 W ptg000012l 1 3452964 -
ntLink_0 3452965 3452984 2 N 20 scaffold yes paired-ends
ntLink_0 3452985 3491959 3 W ptg000127l 1 38975 +
ntLink_0 3491960 3491979 4 N 20 scaffold yes paired-ends
ntLink_0 3491980 3516434 5 W ptg000250l 1 24455 +
ntLink_1 1 19528 1 W ptg000176l 1 19528 +
ntLink_1 19529 39704 2 W ptg000221l 1 20176 -
ntLink_2 1 195567 1 W ptg000067l 1 195567 +
ntLink_2 195568 227286 2 W ptg000166l 223 31941 +
ntLink_3 1 39034 1 W ptg000066l 1068 40101 -
ntLink_3 39035 74300 2 W ptg000107l 979 36244 +
ntLink_4 1 538548 1 W ptg000025l 1504 540051 -
ntLink_4 538549 607680 2 W ptg000105l 3854 72985 +
ntLink_5 1 21959414 1 W ptg000022l 761 21960174 -
ntLink_5 21959415 22011122 2 W ptg000138l 1 51708 -
ntLink_6 1 11766301 1 W ptg000002l 1 11766301 +
ntLink_6 11766302 11766321 2 N 20 scaffold yes paired-ends
ntLink_6 11766322 11770210 3 W ptg000125l 1 3889 +
ptg000001l 1 32039455 1 W ptg000001l 1 32039455 +
ptg000003l 1 35678035 1 W ptg000003l 1 35678035 +
head -20 /scratch/projects/omics/ofav_genome/longstitch_new/Ofav_hifiasm_allcontrem.fa.k32.w100.z1000.trimmed_scafs.agp
## agp-version 2.1
# AGP created by RagTag v2.1.0
NW_018148507.1_RagTag 1 46879 1 W ptg000080l 1 46879 -
NW_018148518.1_RagTag 1 42037 1 W ptg000142l 1 42037 -
NW_018148539.1_RagTag 1 31891 1 W ptg000101l 1 31891 -
NW_018148547.1_RagTag 1 21649 1 W ptg000143l 1 21649 -
NW_018148557.1_RagTag 1 20381 1 W ptg000046l 1 20381 +
NW_018148565.1_RagTag 1 19361 1 W ptg000194l 1 19361 -
NW_018148577.1_RagTag 1 32486 1 W ptg000182l 1 32486 +
NW_018148578.1_RagTag 1 19816 1 W ptg000110l 1 19816 +
NW_018148594.1_RagTag 1 20901 1 W ptg000144l 1 20901 +
NW_018148600.1_RagTag 1 36461 1 W ptg000224l 1 36461 +
NW_018148600.1_RagTag 36462 36561 2 U 100 scaffold yes align_genus
NW_018148600.1_RagTag 36562 60206 3 W ptg000219l 1 23645 +
NW_018148606.1_RagTag 1 26009 1 W ptg000161l 1 26009 +
NW_018148606.1_RagTag 26010 26109 2 U 100 scaffold yes align_genus
NW_018148606.1_RagTag 26110 69146 3 W ptg000130l 1 43037 -
NW_018148608.1_RagTag 1 13225 1 W ptg000245l 1 13225 +
NW_018148618.1_RagTag 1 35247 1 W ptg000056l 1 35247 +
NW_018148627.1_RagTag 1 21186 1 W ptg000168l 1 21186 -
Please also find the top 20 contig names from the assembly
grep ">" ../../hifi_assemblay/all_contam_rem/Ofav_hifiasm_allcontrem.fa | head -20
>ptg000001l
>ptg000002l
>ptg000003l
>ptg000004l
>ptg000005l
>ptg000006l
>ptg000007l
>ptg000008l
>ptg000009l
>ptg000010l
>ptg000011l
>ptg000012l
>ptg000013l
>ptg000014l
>ptg000015c
>ptg000016l
>ptg000017l
>ptg000018l
>ptg000019c
>ptg000020l
Finally, I list my conda environment with installed packages and versions :).
conda list
# packages in environment at /nethome/bdy8/miniconda3/envs/ragtag_env:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
c-ares 1.18.1 h7f8727e_0
ca-certificates 2023.01.10 h06a4308_0
certifi 2022.12.7 py37h06a4308_0
curl 7.87.0 h5eee18b_0
gdbm 1.18 hd4cb3f1_4
intel-openmp 2021.4.0 h06a4308_3561
intervaltree 3.1.0 pyhd3eb1b0_0
k8 0.2.5 h9a82719_1 bioconda
krb5 1.19.4 h568e23c_0
ld_impl_linux-64 2.38 h1181459_1
libcurl 7.87.0 h91b91d3_0
libdeflate 1.0 h14c3975_1 bioconda
libedit 3.1.20221030 h5eee18b_0
libev 4.33 h7f8727e_1
libffi 3.4.2 h6a678d5_6
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libnghttp2 1.46.0 hce63b2e_0
libssh2 1.10.0 h8f2d780_0
libstdcxx-ng 11.2.0 h1234567_1
minimap2 2.22 h5bf99c6_0 bioconda
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.1 py37hd3c417c_0
mkl_random 1.2.2 py37h51133e4_0
mummer 3.23 4 bioconda
ncurses 6.4 h6a678d5_0
networkx 2.6.3 pyhd3eb1b0_0
numpy 1.21.5 py37h6c91a56_3
numpy-base 1.21.5 py37ha15fc14_3
openssl 1.1.1t h7f8727e_0
perl 5.34.0 h5eee18b_2
perl-threaded 5.32.1 hdfd78af_1 bioconda
pip 22.3.1 py37h06a4308_0
pysam 0.15.3 py37hda2845c_1 bioconda
python 3.7.16 h7a1cb2a_0
ragtag 2.1.0 pyhb7b1952_0 bioconda
readline 8.2 h5eee18b_0
setuptools 65.6.3 py37h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sortedcontainers 2.4.0 pyhd3eb1b0_0
sqlite 3.40.1 h5082296_0
tk 8.6.12 h1ccaba5_0
wheel 0.38.4 py37h06a4308_0
xz 5.2.10 h5eee18b_1
zlib 1.2.13 h5eee18b_0
Thank you in advance for any and all help, I am a little stumped with what is going on here.
Ben
okay so have moved forward to another error after more trouble shooting. I renamed the column 1 so that the AGPs did not have the same identifiers as the later columns. This makes the NCBI validator throw 0 errors. Yay
I am now having similar issue as some other people
ragtag.py merge ../../hifi_assemblay/all_contam_rem/Ofav_hifiasm_allcontrem.fa ../../longstitch_new/longstitch.agp ../../ragtag/ofav_scaffold/ragtag.scaffold.agp
Sat Mar 11 14:59:18 2023 --- VERSION: RagTag v2.1.0
Sat Mar 11 14:59:18 2023 --- WARNING: This is a beta version of `ragtag merge`
Sat Mar 11 14:59:18 2023 --- CMD: ragtag.py merge ../../hifi_assemblay/all_contam_rem/Ofav_hifiasm_allcontrem.fa ../../longstitch_new/longstitch.agp ../../ragtag/ofav_scaffold/ragtag.scaffold.agp
Sat Mar 11 14:59:18 2023 --- WARNING: Without '-u' invoked, some component/object AGP pairs might share the same ID. Some external programs/databases don't like this. To ensure valid AGP format, use '-u'.
Sat Mar 11 14:59:18 2023 --- INFO: Building the scaffold graph from the AGP files
Traceback (most recent call last):
File "/nethome/bdy8/miniconda3/envs/ragtag_env/bin/ragtag_merge.py", line 430, in <module>
main()
File "/nethome/bdy8/miniconda3/envs/ragtag_env/bin/ragtag_merge.py", line 362, in main
agp_multi_sg.add_agps(agp_list, in_weights=weight_list, exclusion_set=comp_exclusion_set)
File "/nethome/bdy8/miniconda3/envs/ragtag_env/lib/python3.7/site-packages/ragtag_utilities/ScaffoldGraph.py", line 606, in add_agps
for ap in self._get_assembly_points(agp, weight):
File "/nethome/bdy8/miniconda3/envs/ragtag_env/lib/python3.7/site-packages/ragtag_utilities/ScaffoldGraph.py", line 578, in _get_assembly_points
raise ValueError("Input AGPs do not have the same set of components.")
ValueError: Input AGPs do not have the same set of components.
I could not identify, from the other issues, a fix for this or why this is occuring. It is the same input assembly that was used for the two different scaffold attempts.
Any and all help would be amazing and I am happy to provide more info if you need it :).
Ben
Hi there,
Sorry about the delays. I would use standard command line tools to check if both AGP files contain identical sets of AGP components. I understand that it's the same input assembly, but perhaps some of the contigs are left out depending on the scaffolding solution.
Hi @malonge
First of all apologies was on holiday for the past two weeks.
I am going to be jumping back into this so will post updates and fixes here if I find them :).
Ben
Good morning @malonge et al 🙂
Apologies for the delay but finally got back to this.
So I have successfully trouble shot this as per @malonge suggestion. The problem in the AGP file is in column 6. All contigs from the primary assembly are present and correct, what was actually the problem was different gap sizes input by the ntlink program (default 20) compared to the ragtag (default 100).
I am going to run ntlink with the gapsize set to 100, and then see if the merge will successfully work. This should fix the discrepancy and merge should hopefully work :).
I will post whether this works and then close the issue after that.
Thank you for the help
Ben
Hi,
thank you for this easy to use and well documented tool!
I am struggling with similar issue and I cannot find a way around it.
I want to use ragtag merge to merge a few reference-based agps produced by ragtag scaffold and one HiC-scaffolded agp.
When I include HiC agp to ragtag merge I get the error:
Wed Aug 2 16:04:32 2023 --- VERSION: RagTag v2.1.0
Wed Aug 2 16:04:32 2023 --- WARNING: This is a beta version of `ragtag merge`
Wed Aug 2 16:04:32 2023 --- INFO: Building the scaffold graph from the AGP files
Traceback (most recent call last):
File "/users/timg/.conda/envs/ragtag/bin/ragtag_merge.py", line 430, in <module>
main()
File "/users/timg/.conda/envs/ragtag/bin/ragtag_merge.py", line 362, in main
agp_multi_sg.add_agps(agp_list, in_weights=weight_list, exclusion_set=comp_exclusion_set)
File "/users/timg/.conda/envs/ragtag/lib/python3.9/site-packages/ragtag_utilities/ScaffoldGraph.py", line 606, in add_agps
for ap in self._get_assembly_points(agp, weight):
File "/users/timg/.conda/envs/ragtag/lib/python3.9/site-packages/ragtag_utilities/ScaffoldGraph.py", line 518, in _get_assembly_points
raise RuntimeError("only complete components can be added to the graph.")
RuntimeError: only complete components can be added to the graph.
HiC scaffolding was done using YaHS and then I did some manual curation in Juicebox. Both the YaHS scaffolding agp and agp converted from Juicebox .assembly file give the same ragtag error.
I validated agps (from yahs and juicebox) with your tool and it says there are no errors. When I use the NCBI validator one type of error and several warnings appears:
invalid value for linkage_evidence (column 9): proximity_ligation
This is the error and I think the validator is not updated for AGP version 2.1, where proximity_ligation was added as a valid value, so I think this is ok.
Warnings (one example per type):
same component_id found on different scaffolds; previous occurance at line 815, in another object
If I understand correctly this is because of assembly error correction step in yahs and because of some splitting in juicebox. Some original contigs were split and allocated to different scaffolds. Is this ok for ragtag?
component span appears out of order; preceding span: 1..100000 at line 265
I don't understand this one. These are the lines with above warning:
line_256 scaffold_1 13902924 14002923 265 W contig_2705 1 100000 +
line_1243 scaffold_1 81152522 81177646 1243 W contig_2705 100001 125125 -
duplicate component with non-draft type; preceding span: 438001..544668 at line 897
I also don't understand this one, these are the corresponding lines:
line_897 scaffold_1 55840816 55947483 897 W contig_1439 438001 544668 -
line_1503 scaffold_1 103099110 103348109 1503 W contig_1439 1 249000 -
line_5954 scaffold_8 31810598 31999597 399 W contig_1439 249001 438000 +
Do you have any ideas what could be wrong? I don't know where else to look and what to try.
I read in you ragtag paper that you used agp converted from .assembly with some custom script. Did the manual curation involved breaking some contigs? Can you maybe provide the script you mention in the paper?
I am using juicer post command (packed with yahs distribution) to make agp from assembly as described in yahs tutorial. I also tried juicebox_assembly_converter.py from https://github.com/phasegenomics/juicebox_scripts but it does not work with my .assembly and .fasta, so it looks like something may be wrong there in the first place...
Let me know if I should post some more info.
Thank you in advance for any help.
Tim
Hi,
this error message is not very clear, but waht it means is apparently (line 517 or scaffoldgraph.py):
"if comp_len < self.get_component_len(agp_line.comp):"
-> so ragtag.py merge can not handle the situation whith broken components, i.e. contig breaks!! This seems like a thing that many people will run into. If any scaffolder breaks contigs (e.g. identified a mis-assembly), the resulting AGP can not be used with ragtag merge. Quite disappointing I think.
I see a workaround by first breaking both AGPs at these breakpoints, and give new component names to the resulting products. Perhaps another ragtag module can do that?