In this project we propose methods to link 1000 ImageNet classes to 768 DBpedia classes.
We do this by using string matching methods and language models(word2vec, wikipedia2vec , SBERT)
authors: Alpino, Davide (https://github.com/DavideAl/) ; Wei, Tianran
*Sometimes the path to the .csv input files doesn't work, so you have to put in the absolute path.
*Since GitHub only allows 1 GB of data, the models for word2vec and wikipedia2vec need to be downloaded seperately.
*We used an anconda virtual environment, for the wikipedia2vec this won't work, but executing the same code inside Google Colab works.
--evaluation
--files -> contains all top3 files
--processed_files -> contains all top3 files, if there is no linkage between two classes we added
an empty value for them
evaluation.ipynb -> This notebook is used to generate the top3 and top5 files
--notebooks -> In this folder we have all the notebooks we ever used, to investigate problems, created algorithms and found solutions.
--evaluation -> our final evaluation with
--files -> all files htat we ever used/ generated
--word_embeddings -> contains all notebooks for the word embedding approaches
Bert.ipynb -> approach with SBERT
Jaccard_Levenshtein.ipynb -> Jaccard and levenshtein distance
jaro_similarity.ipynb -> jaro similarity
mapping_results_bert.csv -> results of bert
matching.ipynb ->exact matching using the class from src\data\mapping.py
Query_imageNet_class_in_wikipedia__2.ipynb second approach for the querry
Query_imageNet_class_in_wikipedia.ipynb first approach for the query
string_matching.ipynb -> implementation of the Fuzzy String matching, can be ignored, since this is done again in the Jaccard_Levenshtein norebook
transitive_baseline.csv -> our baseline
transitive_mapping.ipynb -> transitive mapping wordnet,wikidata,dbpedia
wikipedia_test.py -> a simple file to test something, can be ignores
--src\data
mapping.py -> this files contains the class for doing the exact string matching.