Knowledge-Graph-Hub/kg-covid-19

Merge fails due to error tokenizing data

Closed this issue · 2 comments

Describe the bug

During the merge phase of the Jenkins build, this error occurs:

[2023-05-02T04:19:47.961Z] Traceback (most recent call last):
[2023-05-02T04:19:47.961Z]   File "run.py", line 202, in <module>
[2023-05-02T04:19:47.961Z]     cli()
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
[2023-05-02T04:19:47.961Z]     return self.main(*args, **kwargs)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
[2023-05-02T04:19:47.961Z]     rv = self.invoke(ctx)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
[2023-05-02T04:19:47.961Z]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
[2023-05-02T04:19:47.961Z]     return ctx.invoke(self.callback, **ctx.params)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
[2023-05-02T04:19:47.961Z]     return __callback(*args, **kwargs)
[2023-05-02T04:19:47.961Z]   File "run.py", line 94, in merge
[2023-05-02T04:19:47.961Z]     load_and_merge(yaml, processes)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/merge_utils/merge_kg.py", line 33, in load_and_merge
[2023-05-02T04:19:47.961Z]     merged_graph = merge(yaml_file, processes=processes)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 658, in merge
[2023-05-02T04:19:47.961Z]     stores = [r.get() for r in results]
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 658, in <listcomp>
[2023-05-02T04:19:47.961Z]     stores = [r.get() for r in results]
[2023-05-02T04:19:47.961Z]   File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
[2023-05-02T04:19:47.961Z]     raise self._value
[2023-05-02T04:19:47.961Z] pandas.errors.ParserError: Error tokenizing data. C error: Expected 10 fields in line 144676, saw 15

The last input processed before this error is the go-cams, but that may not perfectly correlate with what the actual issue is.
Either way, something's not parsing as expected and it's breaking the merge.

OK, this isn't the GO-CAMs because they don't get anywhere near line 144676 (or 10 fields, for that matter)

Don't know if it's related, but TCRD also changed links again, so will fix in attached PR