![banner](https://private-user-images.githubusercontent.com/56029596/281065201-dc1956c7-da1c-4591-8118-014e690f5bc4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTYxNTQ4MzIsIm5iZiI6MTcxNjE1NDUzMiwicGF0aCI6Ii81NjAyOTU5Ni8yODEwNjUyMDEtZGMxOTU2YzctZGExYy00NTkxLTgxMTgtMDE0ZTY5MGY1YmM0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA1MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNTE5VDIxMzUzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJlZWQ2ZmFlOWE5YTIxN2M1ZDg0NjQ1ZjIxYjdmY2VkNGE1ODI0MzU4MTk4YTFmYmY0MDFlODY1OTE5NzEyMjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.JsJMZ5vaOK1arp1VS9dxhQgDVKCrfdSpapGUVCTKCC0)
Paper link. Created by Uzu Lim, Harald Oberhauser, and Vidit Nanda
HADES is a fast singularity detection algorithm. Singularities are points in data where the Manifold Hypothesis fails, such as cusps and self-intersections. HADES does not use topological methods, and instead works by (1) Locally applying dimensionality reduction and then (2) Performing a kernel goodness-of-fit test. HADES stands for Hypothesis-testing Algorithm for Detection and Exploration of Singularities.
To install, clone the repository and use Poetry
:
$ poetry install
Given a Numpy array X where each row represents a data point, run the following:
from hades import judge
verdict = judge(X)
Below is a convenient starting point for generating sample data, detecting singularities, and plotting them:
from hades import judge
from hades.misc import plot, plot_filt
from hades.gen import two_circles
X = two_circles(5000, noise=0.01)
verdict = judge(X)
plot(X, c=verdict['score'], show=True)
plot_filt(X, verdict['label'], show=True)
The following are hyperparameters used by hades
:
-
r
: radius -
k
: k nearest neighbors -
t
: threshold for PCA ($0 < t < 1$ ) -
a
: kernel parameter ($0 < a < 1$ )
(only one of r
and k
are used at each time, so that only 3 hyperparameters are relevant in each run)
The following are 3 modes of performing hyperparameter search. The default run is the fully automatic search, only over the radius parameter.
# Mode 1. Fully automatic search
verdict = judge(X, search_auto=['r', 't'],
search_res={'r': 5, 't': 3})
# Mode 2. Search over a specified grid of hyperparameters
verdict = judge(X, search_range={'r': (0.05, 0.15), 't': (0.7, 0.9)},
search_res={'r': 5, 't': 3})
# Mode 3. Search over a specified list of hyperparameters
verdict = judge(X, search_list = [{'a': 0.1, 'k': 50, 't': 0.9}, {'a': 0.5, 'k': 50, 't': 0.9}, {'a': 0.9, 'k': 50, 't': 0.9}])
There are Jupyter
notebooks in the notebooks
folder that reproduce computational experiments in the paper.