HamedBabaei/LLMs4OL

Language Models as a Knowledge base - experimentation for our datasets

HamedBabaei opened this issue · 2 comments

It seems that they have made an alignment with Wikipedia texts to obtain a sentence with that specific subject or object entity. Then to make a prediction over an object entity they used MASKs. For concept net, they have considered their own base dataset sentences. Then they created these query templates for relations (in our task I should create it for entity types as well) to query LMs.

According to their work, for us, the input should consist of the alignment text and the query template. For example for word net, we can obtain examples for any synset, for example for an entity cover we can get this sentence: "cover the child with a blanket" and adding the template at the end it would be: "cover the child with a blanket. cover word type is a [MASK]" (or any other template look like this -- this is just example) where the MASK is 'verb' (this is only an idea, but First step is to test this paper idea)

Most of the datasets which didn't have sentences from their own sources relied on Wikipedia! And I had a little look at their code and I understood that they only used embedding and vocabulary which they obtained from each LMs, to calculate a probability for tokens, and then they picked up the top ones and used search engine metrics to evaluate the results.

Now the task is for entity type detection lets:

  • Create sample sets for the Wordnet dataset.
  • Create sample sets for the Geonames dataset - Let's consider level 1 for now.
  • Create sample sets for the UMLS dataset - Let's consider NCI for now.

Short summary of the paper:

1. Chosen models:

  • Unidirectional LMs: fairseq-fconv, Transformer-XL
  • Bidirectional LMs: ELMo (original), ELMo 5.5B, BERT-base, BERT-large

2. Knowledge Sources: They transform the origin of the fact triples into cloze templates, and to what extent aligned texts exist in Wikipedia that is known to express a particular fact.

According to the codes, they ONLY considered these cloze templates for Google-RE dataset

  • Google-RE corpus: 3 fact has been considered with the following templates:
"relation": "place_of_birth",
"template": "[X] was born in [Y] .",
"template_negated": "[X] was not born in [Y] .",

"relation": "date_of_birth",
"template": "[X] (born [Y]).",
"template_negated": "[X] (not born [Y]).",

"relation": "place_of_death",
"template": "[X] died in [Y] .",
"template_negated": "[X] did not die in [Y] .",

and samples from this dataset that are aligned with Wikipedia text supporting it.

{"pred": "/people/person/date_of_birth", "sub": "/m/09gb0bw", "obj": "1941", "evidences": [{"url": "http://en.wikipedia.org/wiki/Peter_F._Martin", "snippet": "Peter F. Martin (born 1941) is an American politician who is a Democratic member of the Rhode Island House of Representatives. He has represented the 75th District Newport since 6 January 2009. He is currently serves on the House Committees on Judiciary, Municipal Government, and Veteran's Affairs. During his first term of office he served on the House Committees on Small Business and Separation of Powers & Government Oversight. In August 2010, Representative Martin was appointed as a Commissioner on the Atlantic States Marine Fisheries Commission", "considered_sentences": ["Peter F Martin (born 1941) is an American politician who is a Democratic member of the Rhode Island House of Representatives ."]}], "judgments": [{"rater": "18349444711114572460", "judgment": "yes"}, {"rater": "17595829233063766365", "judgment": "yes"}, {"rater": "4593294093459651288", "judgment": "yes"}, {"rater": "7387074196865291426", "judgment": "yes"}, {"rater": "17154471385681223613", "judgment": "yes"}], "sub_w": null, "sub_label": "Peter F. Martin", "sub_aliases": [], "obj_w": null, "obj_label": "1941", "obj_aliases": [], "uuid": "18af2dac-21d3-4c42-aff5-c247f245e203", "masked_sentences": ["Peter F Martin (born [MASK]) is an American politician who is a Democratic member of the Rhode Island House of Representatives ."]}

  • T-REx: Knowledge source that is a subset of Wikipedia. They Manually defined templates for relations. 41 relations has been considered. And Sample of this dataset is:

{"uuid": "75e6e7c3-9697-4ad1-b805-5f79f52e8255", "obj_uri": "Q183", "obj_label": "Germany", "sub_uri": "Q57881", "sub_label": "Eibenstock", "predicate_id": "P17", "evidences": [{"sub_surface": "Eibenstock", "obj_surface": "Germany", "masked_sentence": "Eibenstock is a town in the western Ore Mountains, in the Erzgebirgskreis, Saxony, [MASK]."}, {"sub_surface": "Eibenstock", "obj_surface": "Germany", "masked_sentence": "Eibenstock is a town in the western Ore Mountains, in the Erzgebirgskreis, Saxony, [MASK]."}]}

  • ConceptNET: Since it builds on top of OpenMindCommonSense dataset, they used that sentences and didn't consider the explicit alignment of facts to Wikipedia sentences (NO templates at in inference time, according to the code). And Sample of this dataset is:

{"sub": "alive", "obj": "think", "pred": "HasSubevent", "masked_sentences": ["One of the things you do when you are alive is [MASK]."], "obj_label": "think", "uuid": "d4f11631dde8a43beda613ec845ff7d1"}

  • SQuAD: Manually created cloze-style questions from QAs by rewriting "Who developed the theory of relativity?" as "The theory of relativity was developed by ____". For each QA corresponding fact is expressed in Wikipedia since this is how SQuAD was created. (NO extra templates at inference time, according to the code). and Sample of this dataset is:

{"masked_sentences": ["To emphasize the 50th anniversary of the Super Bowl the [MASK] color was used."], "obj_label": "gold", "id": "56be4db0acb8001400a502f0_0", "sub_label": "Squad"}


3. Evaluation Metric: Mean Precision at K (P@K). Here K=1.


4. Considerations:

  1. Manually defined templates: This means that they are measuring lower bound for what LMs know.
  2. Single Token: Because multi-token decoding adds a number of additional tuneable parameters.
  3. Object Slots: Using reverse relation they are able to query subjects as well. No relation slot query because: (1) Multi token phrases issue, (2) unclear what will be the gold standard pattern for relations
  4. Intersection of Vocabularies: ELMo vocab size: 800k, BERT vocab size: 30k. Larger vocab harder it would be to rank the gold token at the top. So they limit vocab size to 21k common vocabs.

5. Results:

  • BERT-Large is doing well!
  • Baseline model:‌ Freq, and DrQA (for SQuAD dataset only)
  • KB model is the relation-extractor model based on Wikipedia data (LSTM+Attention):
    * They provided sentences for facts in the test set for relation extraction (used alignments).
    * The output of this model is triples and then they used those triples for KG construction.
    * In query time, given subject entity they were able to rank all objects in the correct relation: RE_n is the naive entity linking solution. RE_o is an oracle for entity linking

The weakness of this paper: for some of our entities we have to mask more than one word, for this reason in order to test this paper we move forward with level 1 in Geonames and UMLS datasets.

To solve this issue we will move forward with BART!