Merge fails due to error tokenizing data
Closed this issue · 2 comments
caufieldjh commented
Describe the bug
During the merge phase of the Jenkins build, this error occurs:
[2023-05-02T04:19:47.961Z] Traceback (most recent call last):
[2023-05-02T04:19:47.961Z] File "run.py", line 202, in <module>
[2023-05-02T04:19:47.961Z] cli()
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
[2023-05-02T04:19:47.961Z] return self.main(*args, **kwargs)
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
[2023-05-02T04:19:47.961Z] rv = self.invoke(ctx)
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
[2023-05-02T04:19:47.961Z] return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
[2023-05-02T04:19:47.961Z] return ctx.invoke(self.callback, **ctx.params)
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
[2023-05-02T04:19:47.961Z] return __callback(*args, **kwargs)
[2023-05-02T04:19:47.961Z] File "run.py", line 94, in merge
[2023-05-02T04:19:47.961Z] load_and_merge(yaml, processes)
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/merge_utils/merge_kg.py", line 33, in load_and_merge
[2023-05-02T04:19:47.961Z] merged_graph = merge(yaml_file, processes=processes)
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 658, in merge
[2023-05-02T04:19:47.961Z] stores = [r.get() for r in results]
[2023-05-02T04:19:47.961Z] File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 658, in <listcomp>
[2023-05-02T04:19:47.961Z] stores = [r.get() for r in results]
[2023-05-02T04:19:47.961Z] File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
[2023-05-02T04:19:47.961Z] raise self._value
[2023-05-02T04:19:47.961Z] pandas.errors.ParserError: Error tokenizing data. C error: Expected 10 fields in line 144676, saw 15
The last input processed before this error is the go-cams, but that may not perfectly correlate with what the actual issue is.
Either way, something's not parsing as expected and it's breaking the merge.
caufieldjh commented
OK, this isn't the GO-CAMs because they don't get anywhere near line 144676 (or 10 fields, for that matter)
caufieldjh commented
Don't know if it's related, but TCRD also changed links again, so will fix in attached PR