merge Geo nodes
Opened this issue · 0 comments
VladimirAlexiev commented
Currently each Institution has its own Geo node, so there are a lot:
PREFIX soa: <https://semopenalex.org/ontology/>
select (count(*) as ?c) {
?x a soa:Geo
} # 106956
Some queries will be more convenient if you merge the equivalent nodes,
eg "which city has the most publications by institutions located in that city"
If you do #76 and enable owl:sameAs
reasoning, the merging will be done automatically because:
<https://semopenalex.org/geo/I200650556> owl:sameAs <https://sws.geonames.org/3149318/>.
<https://semopenalex.org/geo/I1234567890> owl:sameAs <https://sws.geonames.org/3149318/>.
will make them be sameAs each other.
But there are a couple of problems.
1: Not all Geo nave geonames link:
PREFIX soa: <https://semopenalex.org/ontology/>
PREFIX gn: <http://www.geonames.org/ontology#>
select (count(*) as ?c) {
?x a soa:Geo
filter not exists {?x gn:geonamesID ?id}
} # 4593
2: If two names for the same city (eg "Washington DC" vs "Washington, D.C." are in two Geo nodes,
then the merged node will obtain two labels, which is not ideal.
Even worse with wgs:lat, long
, which are expected to differ by some small number.
So: a more thorough data fusion procedure will be needed.