taxonomy-extraction

Code for automatically extracting expressive and non-expressive taxonomies from knowledge graphs (research project at the Lama-West lab, Polytechnique Montréal, Canada).

Non-expressive : Starting from a knowledge graph KG and a set of entity-type pairs, (1) entities are embedded into a d-dimensional vector space (2) then they’re hierarchically clustered; (3) each type in the original dataset is then mapped to one of the cluster, (4) the taxonomy is extracted by removing non-selected clusters.

Two methods for mapping types to clusters: Hard Mapping and Soft Mapping. Hard Mapping computes an optimal, one-for-one injective mapping between types and clusters by solving a linear sum assignment problem (right now I'm using the Kuhn-Munkres algorithm but I plan to implement a more efficient heuristic, such as Asymmetric Greedy Search). Soft Mapping defines a mapping score between types and clusters, recursively computes a probability for each subsumption axiom, and performs a transitive reduction on the resulting DAG.

Preliminary Results

		direct			transitive
Method	Model	cos	euc	eucw	cos	euc	eucw
HardMapping	ComplEx	0,52	0,49	0,47	0,74	0,67	0,65
	DistMult	0,5	0,39	0,39	0,64	0,59	0,63
	RDF2Vec	0,50	0,31	0,56	0,63	0,46	0,74
	TransE	0,81	0,63	0,7	0,93	0,76	0,8
SoftMapping	ComplEx	0,5	0,48	0,47	0,79	0,74	0,7
	DistMult	0,47	0,45	0,45	0,74	0,74	0,74
	RDF2Vec	0,82
	TransE	0,82	0,69	0,77	0,93	0,84	0,92
TIEmb	ComplEx	0,38	0,4		0,37	0,48
	DistMult	0,26	0,39		0,27	0,47
	RDF2Vec*	0,81	0,73		0,57	0,42
	TransE	0,74	0,76		0,86	0,79

Expressive : for expressive taxonomy extraction, the algorithm starts from an axiom A, sample n entities verifying this axiom, and run a hierarchical clustering over them. The clusters are then labelled by expressive axioms using statistics on linked data, and a taxonomic tree T(A) is extracted. Then, T(A) is iteratively expanded by sampling new entities from the axioms in T(A) and adding the extracted subtrees to T(A).

The core code is contained in libs:

libs.graph: load and query knowledge graphs.
libs.dataset: create datasets for the taxonomy extraction task, or load existing datasets from file.
libs.axiom: provide useful classes for axiom handling and boolean logic.
libs.models: train and use knowledge graph embedding models. For now, we mostly use OpenKE, so libs.models only contains an implementation of RDF2Vec.
libs.extraction: extract taxonomies from a Dataset and an embedding matrix.
libs.axiom_extraction: extract expressive taxonomies from an embedding matrix and a knowledge graph

Papers

What is the schema of your knowledge graph?, A. Zouaq and F. Martel, Proceedings of The International Workshop on Semantic Big Data, June 2020
Taxonomy extraction using knowledge graph embeddings and hierarchical clustering, F. Martel and A. Zouaq, Proceedings of the 36th Annual ACM Symposium on Applied Computing, March 2021
(in French) Extraction de taxonomie par plongement hiérarchique de plongements vectoriels de graphes de connaissance, F. Martel, masters thesis, Polytechnique Montréal, October 2021

felix-martel/taxonomy-extraction

taxonomy-extraction

Papers