Paper link: https://arxiv.org/abs/2310.14820v1
Everyone can get our dataset from here.
Each line in dataset/meta_data.jsonl
corresponds to the metadata information of one artificial entity:
artificial_entity
: the information of artificial entityname
: the name of artificial entityid
: the negative of parent entity's idrank
: rank in the biological taxonomyproperty
: a list of property structure (mentioned later).
parent_entity
: the information of parent entity.difference
: the differences made for generating the properties of the artificial entity.extension
: a list of property structure of artificial entity from other entities.variation
: a list of tuples(old_property, new_property)
,old_property
refers to the source of the created property (new_property
) of artificial entity.heredity
: a list of property structure of artificial entity inherited from parent entity.dropout
: a list of property structure of parent entity not inherited to artificial entity.
The property structure is structured as follow:
name
: the name of current property.type
: the type of current property. One of['attribute', 'relation']
.values
: a list of valid values of current property.
dataset/id2question.json
contains a dict mapping from artificial entity id to its corresponding questions.
Each question is structured as following,
question
answers
: a list of all valid answers.form
: the form of the question. One of['boolean' 'fill-in-blank', 'multi-choice']
.type
: the subset to which the question belongs. One of[ 'Knowledge Understanding', 'Knowledge Differentiation', 'Knowledge Association']
meta_data
:related_property
: the property of the artificial entity related to the current question. Ifdifference
is'variation'
, it is a tuple of(old_property, new_property)
. For other cases, it is a property structure.difference
: the difference type ofrelated_property
. One of['extension', 'variation', 'heredity', 'dropout']
hop_triplets
(optional): the chain of relation triplets corresponding to the multi-hop question.Only available forKnowledge Association
dataset.
Please cite the paper and star this repo if you use ALCUNA (or KnowGen) and find it interesting/useful, thanks!
@misc{yin2023alcuna,
title={ALCUNA: Large Language Models Meet New Knowledge},
author={Xunjian Yin and Baizhou Huang and Xiaojun Wan},
year={2023},
eprint={2310.14820},
archivePrefix={arXiv},
primaryClass={cs.CL}
}