Project: From meta-features to new algorithms
Closed this issue · 0 comments
I am broadly interested in the question: given this problem, which algorithm I should use? For example, if you have a data-set and you want to build a classification model, you could choose between a SVM, a Random Forest, a Linear Discriminant, just to mention a few. Each algorithm would have different performance, with some being really good, and others pretty bad. However, this information is usually not known until we test all (or many) candidate algorithms.
To try to answer this question, we take a number of benchmark problems and measure both the performance of an algorithm and some characteristics of the problem. For example, in classification, a characteristic may be the number of classes, the average distance between observations, etc. These characteristics are called meta-features.
Once the meta-features are calculated, we are able to model the relationship between problems and algorithms, and even visualize the "space of problems" and the areas in which a particular algorithm works well. While this is useful for algorithm selection, in some cases, no known (or tested) algorithm works well in a new problem. Therefore the question arise, how can we use the knowledge acquired through the meta-features to perhaps adapt or create an algorithm fit for purpose?
The type of tools that we use are mostly from statistical machine learning, but I am looking to integrate new theoretical areas. In particular, I am interested in ideas from topology and information geometry (such as statistical manifolds), but my theoretical background is not the strongest. Moreover, it is possible that are better ways to do what we are currently doing. Perhaps other points of view would be useful.
Some of our recent papers published in the topic are:
doi:10.1162/EVCO_a_00194
https://www.researchgate.net/publication/315835025_Instance_Spaces_for_Machine_Learning_Classification
A kind of proof-of-concept of a new idea:
doi:10.1145/3067695.3075971
Some earlier papers are:
doi:10.1016/j.ins.2015.05.010
doi:10.1109/TEVC.2014.2302006