RWTH2023WS-KG-LAB-Task1-Paper-Semantification

How to run it

Prerequisite: Install docker and docker-compose at your local machine in order to be able to execute the commands below. https://docs.docker.com/get-docker/

git clone https://github.com/HaigeWang1/Paper-Semantification.git
- paper_semantification includes a parser that relies on OpenAI public endpoints. To make it work a key is required.
  - Create an .env file in the same folder as docker-compose.yaml
  - Set the env variable OPENAI_API_KEY="sk-..."
docker build -t paper_semantification . Build the docker image for the python service paper_sementification
docker-compose up -d Run the whole application

Docker-compose contains two services:

Database Neo4J can be access locally through http://localhost:7474, connect URL bolt://localhost:7687.
- Authentication is disabled, thus ignore the fields related to authentication
Our python service exposes its APIs through a FastAPI server http://localhost:8000/docs
- You can call the different endpoints that our service exposes

The purpose of this task is to comprehensively process scholarly papers by leveraging metadata extraction services such as CERMINE and GROBID APIs.

Available APIs
- ceurspt provides CERMINE and GROBID
- http://ceurspt.wikidata.dbis.rwth-aachen.de/Vol-2462/paper1.html
- http://ceurspt.wikidata.dbis.rwth-aachen.de/Vol-2462/paper1.grobid
- http://ceurspt.wikidata.dbis.rwth-aachen.de/Vol-2462/paper1.cermine
[OPTIONAL] The ceur-ws template introduced a structure into the PDFs and recommended to at least provide a e-mail address or other identifier
- optimization to extract author information based on the template