own-pt/openWordnet-PT

topology of PWN

vcvpaiva opened this issue · 3 comments

Could anyone tell me how many of the 117659 synsets have glosses? not all do

Can we add to the repo somewhere the corpus of glosses, inspectable?
https://wordnet.princeton.edu/glosstag.shtml

fcbr commented

@vcvpaiva AFAIK all PWN synsets have glosses. For example, if we use the Prolog output of PWN, and removing the duplicates in the Prolog generated we have 117659 entries.

$ cd prolog
$ cat wn_g.pl | awk -F, '{print $1}'| sort | uniq -c | wc
117659

Also, if we look at the tagged glosses, it seems that all of them have tagged glosses too:

$ cd glosstag/standoff
$ cat index.byid.tab | awk -F'$\t' '{print $1'} | sort | uniq | wc
117659

thanks @fcbr!
this is odd, as I am sure many times I have had the impression not having a gloss.
maybe it's when it's a single word like

05893261-n sine_qua_non, essential_condition | sine qua non
(a prerequisite)

what is a tagged gloss, please?

and questions on the topology of PWN:

  1. how many synsets s go directly to Entity? do all synsets go to Entity?

  2. how many have two hops?

  3. how many have a long hierarchy like kitty<domestic_cat <cat<feline<carnivore< placental_mammal < mammal<vertebrate<chordate<animal<organism<living_thing<entity?

  4. I seem to remember that you were calculating isolated nodes vs hierarchies? where is that data now?

Yes, we do have glosses with only 1-2 words. The tagged gloss corpus is not complete; not all glosses are entirely tagged, I talked with Christiane Fellbaum about it. Actually, this is an excellent work still waiting to be done.

corpus of glosses = tagged corpus