/ISE-Linking-Entities-from-Images-to-Knowledge-Graphs

This is the project repo of the course Information Service Engineering

Primary LanguageJupyter Notebook

Linking Entities from Images to Knowledge Graphs

In this project we propose methods to link 1000 ImageNet classes to 768 DBpedia classes.
We do this by using string matching methods and language models(word2vec, wikipedia2vec , SBERT)

authors: Alpino, Davide (https://github.com/DavideAl/) ; Wei, Tianran

Observe

*Sometimes the path to the .csv input files doesn't work, so you have to put in the absolute path.

*Since GitHub only allows 1 GB of data, the models for word2vec and wikipedia2vec need to be downloaded seperately.

Word2Vec

Wikipedia2vec

*We used an anconda virtual environment, for the wikipedia2vec this won't work, but executing the same code inside Google Colab works.

Folder Documentation

--evaluation
   --files -> contains all top3 files
   --processed_files -> contains all top3 files, if there is no linkage between two classes we added an empty value for them
   evaluation.ipynb -> This notebook is used to generate the top3 and top5 files

--notebooks -> In this folder we have all the notebooks we ever used, to investigate problems, created algorithms and found solutions.
  --evaluation -> our final evaluation with
  --files -> all files htat we ever used/ generated
  --word_embeddings -> contains all notebooks for the word embedding approaches
  Bert.ipynb -> approach with SBERT
  Jaccard_Levenshtein.ipynb -> Jaccard and levenshtein distance
  jaro_similarity.ipynb -> jaro similarity
  mapping_results_bert.csv -> results of bert
  matching.ipynb ->exact matching using the class from src\data\mapping.py
  Query_imageNet_class_in_wikipedia__2.ipynb second approach for the querry
  Query_imageNet_class_in_wikipedia.ipynb first approach for the query
  string_matching.ipynb -> implementation of the Fuzzy String matching, can be ignored, since this is done again in the Jaccard_Levenshtein norebook
  transitive_baseline.csv -> our baseline
  transitive_mapping.ipynb -> transitive mapping wordnet,wikidata,dbpedia
  wikipedia_test.py -> a simple file to test something, can be ignores

--src\data
   mapping.py -> this files contains the class for doing the exact string matching.