outgraph
is a simple outlier detection tool for graph datasets. Given a list of graphs, it uses Mahalanobis distance detect which graphs are outliers based on either their topology or node attributes.
Note:
outgraph
only works for datasets where each graph has an equal number of nodes.
You can install outgraph
with pip
:
$ pip install outgraph
Unlike most approaches to graph outlier detection, outgraph
does not use machine learning. Instead, each graph is converted into a vector representation using one of three available methods:
- Averaging the node feature/attribute vectors
- Flattening the adjacency matrix
- A concatenation of 1 and 2
Then, the Mahalanobis distance between each vector and the distribution of vectors is calculated. Lastly, a Chi-Squared distribution is used to model the distances and identify those outside a cutoff threshold (e.g. p < 0.05).
This approach is based off this article.
Each graph in your dataset needs to be an instance of outgraph.Graph
. This object has two parameters, node_attrs
and adjacency_matrix
–– both numpy arrays where the indices correspond to nodes. Example:
import numpy as np
from outgraph import Graph
node_attrs = np.array([[-1], [0], [1]])
adj_matrix = np.array([[1, 1, 0],
[1, 1, 1],
[0, 1, 1]])
graph = Graph(node_attrs, adj_matrix)
Once you have a list of Graph
objects, simply submit them to outgraph.detect_outliers
:
from outgraph import Graph, detect_outliers
graphs = [Graph(), ...]
outliers, indices = detect_outliers(graphs, method=1, p_value=0.05)
Notice the method
and p_value
parameters. The method
parameter is an integer between 1 and 3 that corresponds to one of the three graph vectorization methods described in the How it Works section. p_value
is the outlier cutoff threshold.