Node classification in knowledge graphs aids in the discovery of new drugs, the identification of risky users in social networks, and the completion of missing type information in knowledge graphs. A crucial requirement for many stakeholders is to understand the models’ predictions in these applications. To this end, concept learners have been proposed to learn concepts in description logic from positive and negative nodes in knowledge graphs in an interpretable way. However, learning concepts on large datasets is computationally expensive. Manually selecting important subgraphs and features to reduce runtime is tedious. While many feature selection approaches have been proposed to simplify the process for tabular data, they are not directly applicable to knowledge graphs and concept learning. In addition, current concept learners have a large number of hyperparameters that need to be optimized to achieve high predictive performance. In this paper, we propose AutoCL—an AutoML approach that is tailored to concept learning. AutoCL comprises methods for automatic feature selection and hyperparameter optimization for concept learners. We demonstrate its effectiveness with SML-Bench, a benchmarking framework for structured machine learning. Our approach leads to better predictive performance across concept learners, in terms of F1-measure and Accuracy while reducing runtime on 6 of 8 datasets from SML-Bench.
git clone https://github.com/dice-group/AutoCL
cd AutoCL
conda create --name temp python=3.8
conda activate temp
conda install -c conda-forge optuna=3.0.3
conda install -c conda-forge pandas=1.2.3
conda install -c conda-forge owlapy=0.41
conda install -c conda-forge owlready2=0.41
conda install scikit-learn=1.0.2
conda env update --name temp
python -c 'from setuptools import setup; setup()' develop
python -c "import ontolearn"
Our Dataset from SML-Bench (Structured Machine Learning Benchmark) is a benchmark for machine learning from structured data. It provides datasets, which contain structured knowledge (beyond plain feature vectors) in languages such as the Web Ontology Language (OWL) or the logic programming language Prolog.
We proposed one approach for hyperparameter optimization of concept learners based one Covariance Matrix Adaptation Evolution Strategy(CMA-ES) sampler from Optuna framework.
Our HPO source code shown in hyperparameter optimization approach
from AutoCL
folder.
For each concept learner, We ran CMA-ES sampler via get_best_optimization_result_for_cmes_sampler
function in order to obtain the best hyperparameters of our concepet learner.
We provides methods for automatic feature selection in knowledge graphs.
Our two Feature Selection appraoches source code shown in table-based feature selection
and graph-based feature selection
from AutoCL\feature_selection
folder.
Our first idea is to use a table-based wrapper method for feature selection: We employeed OwlReady2 to extra the features from KGs. Functions get_data_properties
, get_object_properties
are used to get DatatypeProperty features and ObjectProperty features.Then transform_data_properties
and transform_object_properties
are used to covert the features into tabular format.
Finaly, search the best features from select_k_best_features
function based on chi square test scores.
Our second idea is to use a graph-based wrapper method for feature selection: We run EvoLearner, which directly operates on the graph structure, to obtain the relevant top features from concepts via the function get_prominent_properties_occurring_in_top_k_hypothesis
@inproceedings{XAI2024,
author = {Jiayi Li, Sheetal Satheesh, Stefan Heindorf, Diego Moussallem, René Speck and Axel-Cyrille Ngonga Ngomo},
title = {AutoCL: AutoML for Concept Learning},
year = 2024
}
If you have any questions or feedbacks, feel free to contact us at jiayili@mail.uni-paderborn.de