topology of PWN
vcvpaiva opened this issue · 3 comments
Could anyone tell me how many of the 117659 synsets have glosses? not all do
Can we add to the repo somewhere the corpus of glosses, inspectable?
https://wordnet.princeton.edu/glosstag.shtml
@vcvpaiva AFAIK all PWN synsets have glosses. For example, if we use the Prolog output of PWN, and removing the duplicates in the Prolog generated we have 117659 entries.
$ cd prolog
$ cat wn_g.pl | awk -F, '{print $1}'| sort | uniq -c | wc
117659
Also, if we look at the tagged glosses, it seems that all of them have tagged glosses too:
$ cd glosstag/standoff
$ cat index.byid.tab | awk -F'$\t' '{print $1'} | sort | uniq | wc
117659
thanks @fcbr!
this is odd, as I am sure many times I have had the impression not having a gloss.
maybe it's when it's a single word like
05893261-n sine_qua_non, essential_condition | sine qua non
(a prerequisite)
what is a tagged gloss, please?
and questions on the topology of PWN:
-
how many synsets s go directly to Entity? do all synsets go to Entity?
-
how many have two hops?
-
how many have a long hierarchy like kitty<domestic_cat <cat<feline<carnivore< placental_mammal < mammal<vertebrate<chordate<animal<organism<living_thing<entity?
-
I seem to remember that you were calculating isolated nodes vs hierarchies? where is that data now?
Yes, we do have glosses with only 1-2 words. The tagged gloss corpus is not complete; not all glosses are entirely tagged, I talked with Christiane Fellbaum about it. Actually, this is an excellent work still waiting to be done.
corpus of glosses = tagged corpus