kg-lab-ws23-task3

Knowledge Graph Lab 💡 - winter semester 2023 - task 3

This repository embodies an approach of identifying scientific events such as conferences and workshops from proceedings.com and assign it to its corresponding entry on Wikidata. Missing properties on Wikidata will be added or a new event will be generated if it doesn't exist on Wikidata yet.

GitHub last commit GitHub issues GitHub issues Python Version Python package

Use Case Scenario

Situation:

We want to add the data of a conference from proceedings.com to Wikidata, but we don't know if it already has an entry in Wikidata. Furthermore its data format is not always suitable for property generation.

Action:

  • Identify potential entries in Wikidata corresponding to our conference.
  • Convert our entry and its potential candidates into a uniform format using Chat-GPT.
  • Decide wether or not any of the candidates correspond to our conference using Large Language Models.
  • Update or generate the Wikidata entry.

Expected Result:

Our conference receives a Wikidata entry which contains all its relevant properties.

Current status of the project:

  • Assessment of current Wikidata status ✓
  • Preprocessing of queries ✓
  • (Method for encoding established) ✓
  • Validation of LLM (check for accuracy and precision) ✓
  • (Further methods of assignments established) ✓
  • Method for data transfer to Wikidata ✓
  • Establishment of an semi-automated method ✓
  • Validation of the method ✓

Deadlines

  • 2024-01-19: Midterm coordination ✓
  • 2024-03-22: Project result delivery ✓
  • 2024-03-29: Final presentation

Usage

First setup:

  • Clone repository
git clone https://github.com/olafbombach/kg-lab-ws23-task3
  • Run script
. scripts/run.sh

General usage:

esc -h

Why?

This task is part of the practical lab (KG Lab) presented by the Chair of Databases and Information Systems (i5) of RWTH Aachen.

How to contribute to this project:

Thank you for your interest in contributing to this project! We welcome all kinds of contributions, no matter how small or big they are. Whether it's adding new features, fixing bugs, improving documentation, or suggesting new ideas. In this regard, please follow our Code of Conduct

Considered pipeline

Code workflow (DataClass handling)