Parsing in cases where there are blank intermediate AS/n columns
Closed this issue · 6 comments
In the Kidney table (1.1), we have some cases where there is no AS/2:
I'm not quite sure why this is allowed (@emquardokus - please comment). In this case should we report back an error to the table editors or should the parsers skip the missing column to link terms?
It makes biological sense to make the link in this case. The ASCT+B reporter does this:
The JSON from the API (https://asctb-api.herokuapp.com) implies some link but seems to fail to parse properly
{
"anatomical_structures": [
{
"name": "kidney",
"id": "UBERON:0002113",
"rdfs_label": "kidney"
},
{
"name": "Vessels",
"id": "",
"rdfs_label": ""
},
{
"name": "Endothelium (non glomerular)",
"id": "",
"rdfs_label": ""
}
],
sheet = 'Kidney': '2137043090', # Kidney_v1.1
@dosumis @anitacaron @bherr2
This is a mistake when this happens. I've corrected this now. Given that it's for "vessels" an incredibly ambiguous term---blood, lymph, artery/vein? Anyhow, that issue has plagued these tables in general and needs to be addressed in the next version.
Also fixed label provided by authors "kidney macrophage" to correct CL label "kidney resident macrophage":
M2-Macrophage | kidney resident macrophage | CL:1000698 |
---|
Given that it's for "vessels" an incredibly ambiguous term---blood, lymph, artery/vein?
For this - it might make more sense to add skip the intermediate (unmapped term). This would be consistent with Uberon for all of these:
AS/2/LABEL | AS/2/ID | AS/3 | AS/3/LABEL | AS/3/ID | AS/4 | AS/4/LABEL | AS/4/ID |
---|---|---|---|---|---|---|---|
kidney vasculature | UBERON:0006544 | Endothelium/Juxtaglomerular apparatus | Afferent Arteriole | renal afferent arteriole | UBERON:0004639 | ||
kidney vasculature | UBERON:0006544 | Endothelium/Juxtaglomerular apparatus | Efferent Arteriole | renal efferent arteriole | UBERON:0004640 | ||
kidney vasculature | UBERON:0006544 | Endothelium (non glomerular) | Descending Vasa Recta | vasa recta descending limb | UBERON:0009202 | ||
kidney vasculature | UBERON:0006544 | Endothelium (non glomerular) | Ascending Vasa Recta | vasa recta ascending limb | UBERON:0009091 |
This one is overly broad & needs an new Uberon term:
kidney vasculature | UBERON:0006544 | Endothelium (non glomerular) | lymphatics | lymphatic vessel endothelium | UBERON:0002042 |
---|
This is a mistake when this happens.
Then I think the AI for this repo is that we should get some kind of warning/feedback when this happens. (The comments on content should be moved to CC_tools / review board).
Given that it's for "vessels" an incredibly ambiguous term---blood, lymph, artery/vein?
For this - it might make more sense to add skip the intermediate (unmapped term). This would be consistent with Uberon for all of these:
AS/2/LABEL AS/2/ID AS/3 AS/3/LABEL AS/3/ID AS/4 AS/4/LABEL AS/4/ID
kidney vasculature UBERON:0006544 Endothelium/Juxtaglomerular apparatus Afferent Arteriole renal afferent arteriole UBERON:0004639
kidney vasculature UBERON:0006544 Endothelium/Juxtaglomerular apparatus Efferent Arteriole renal efferent arteriole UBERON:0004640
kidney vasculature UBERON:0006544 Endothelium (non glomerular) Descending Vasa Recta vasa recta descending limb UBERON:0009202
kidney vasculature UBERON:0006544 Endothelium (non glomerular) Ascending Vasa Recta vasa recta ascending limb UBERON:0009091
For these, shouldn't the AS/4 come before the AS/3? arteriole before the tissue that makes up the tissue...
Closing as we are now handling the 'skips' 'properly'. reopen/let me know if it's still not working for you.
This is currently handled by skipping the blank AS and assumes the table author just skipped those columns for whatever reason. Works and doesn't break anything, but could result in wrong results if they intended for the blanks to be filled with the row from above. It looks like they intend it to be just skipped like we're doing now.