/AutoKG

Code and dataset for the paper "LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities".

Primary LanguagePythonMIT LicenseMIT

AutoKG

Awesome License: MIT img

Code and Data for the paper "LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities"

🌄Overview

Overview

The overview of our work. There are three main components: 1) Basic Evaluation: detailing our assessment of large models (text-davinci-003, ChatGPT, and GPT-4), in both zero-shot and one-shot settings, using performance data from fully supervised state-of-the-art models as benchmarks; 2) Virtual Knowledge Extraction: an examination of large models' virtual knowledge capabilities on the constructed VINE dataset; and 3) Automatic KG: the proposal of utilizing multiple agents to facilitate the construction and reasoning of KGs.

🌟 Evaluation

Data Preprocess

The datasets that we used in our experiments are as follows:

  • KG Construction

    You can download the dataset from the above address, and you can also find the data used in this experiment directly from the corresponding "datas" folder like DuIE2.0.

  • KG Reasoning

  • Question Answering

    • FreebaseQA
    • MetaQA

The expected structure of files is:

AutoKG
 |-- KG Construction
 |    |-- DuIE2.0
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- duie_processor.py        #preprocess data
 |    |    |-- duie_prompts.py          #generate prompts
 |	  |--MAVEN
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- maven_processor.py       #preprocess data
 |    |    |-- maven_prompts.py         #generate prompts
 |    |--RE-TACRED
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- retacred_processor.py    #preprocess data
 |    |    |-- retacred_prompts.py      #generate prompts
 |    |--SciERC
 |    |    |-- datas                    #dataset
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- scierc_processor.py      #preprocess data
 |    |    |-- scierc_prompts.py        #generate prompts
 |-- KG Reasoning (Link Prediction)
 |    |-- FB15k-237
 |    |    |-- data                     #sample data
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |-- ATOMIC2020
 |    |    |-- data                     #sample data
 |    |    |-- prompts                  #0-shot/1-shot prompts
 |    |    |-- system_eval              #eval for ATOMIC2020
 

How to Run

  • KG Construction(Use DuIE2.0 as an example)

    cd KG Construction
    python duie_processor.py 
    python duie_prompts.py

    Then we’ll get 0-shot/1-shot prompts in the folder prompts

  • KG Reasoning

  • Question Answering

🕵️Virtual Knowledge Extraction

The VINE dataset we built can be retrieved from the folder “Virtual Knowledge Extraction/datas

Do the following code to generate prompts:

cd Virtual Knowledge Extraction
python VINE_processor.py
python VINE_prompts.py

🤖AutoKG

Our AutoKG code is based on CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society and a LangChain implementation of the paper, you can get more details through this link.

  • Change the OPENAI_API_KEY in Autokg.py
  • Change the SERPAPI_API_KEY in RE_CAMEL.py .( You can get more information in serpapi )

Run the Autokg.py script.

cd AutoKG
python Autokg.py

Citation

If you use the code or data, please cite the following paper:

@article{zhu2023llms,
  title={LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities},
  author={Zhu, Yuqi and Wang, Xiaohan and Chen, Jing and Qiao, Shuofei and Ou, Yixin and Yao, Yunzhi and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
  journal={arXiv preprint arXiv:2305.13168},
  year={2023}
}