Graph Database
Synthetic graph database generation. Each class is generated with a prototype and afterwards distortions are applied. To run the default example:
$ pip install -r requirements.txt
$ python generate_dataset.py
Usage
Usage of generate_dataset.py script:
usage: generate_dataset.py [-h] [--dirPrototypes DIRPROTOTYPES]
[--nodeThreshold NODETHRESHOLD]
[--dirDataset DIRDATASET] [--division DIVISION]
[--unbalanced] [--nodeDisplace NODEDISPLACE]
[--nodeAdd NODEADD] [--edgeMaximum EDGEMAXIMUM]
[--addEdge ADDEDGE] [--rmEdge RMEDGE]
[--edgeConnection EDGECONNECTION]
Generate a dataset from a given prototype folder.
optional arguments:
-h, --help show this help message and exit
--dirPrototypes DIRPROTOTYPES
prototype folder
--nodeThreshold NODETHRESHOLD
prototypes node threshold
--dirDataset DIRDATASET
dataset folder
--division DIVISION division (tr, val, te)
--unbalanced Unbalanced database
--nodeDisplace NODEDISPLACE
node std for distort its position
--nodeAdd NODEADD node std for adding a node in a source neighbourhood
--edgeMaximum EDGEMAXIMUM
maximum number of new edges that can be added
--addEdge ADDEDGE probability to add new edge
--rmEdge RMEDGE probability to remove an edge
--edgeConnection EDGECONNECTION
probability new edge is connected to an existing node
Prototypes
Prototypes folder contains prototypes to generate different dataset and also combinations:
$ --dirPrototypes ['./prototypes/Letters/', './prototypes/Digits/']
The proposed prototypes can be found here.
Parameter discussion
Evaluation on the effect of the proposed parameters.
Add nodes
Controlled by --nodeThreshold parameter, increase the number of nodes of the prototypes before the deformation. It tries to add a node at the specified distance, equispaced following the edges.
Some examples with graph A normalized before and after adding the nodes:
Original graph |
---|
--nodeThreshold | Image | --nodeThreshold | Image | |
---|---|---|---|---|
0.10 | 0.20 | |||
0.30 | 0.40 |
Node distortion
Controlled by --nodeDisplace parameter, add random noise following a normal distribution center at each node with standard deviation set by --nodeDisplace.
Some examples with graph A where --nodeThreshold has been set to 0.40.
Original graph |
---|
--nodeDisplace | Image | --nodeDisplace | Image | |
---|---|---|---|---|
0.01 | 0.05 | |||
0.10 | 0.20 |
Insert edges
Controlled by --edgeMaximum parameter, --addEdge, --edgeConnection and --nodeAdd, adds at most --edgeMaximum edges with probability --addEdge. The source node is always a existing node in the graph, the target node is an existing one with probability --edgeConnection. If a new node is add, it is created in a neighbourhood with standard deviation --nodeAdd.
Some examples with graph A where --nodeThreshold has been set to 0.40, --nodeDisplace 0.10, --edgeMaximum 10 and --nodeAdd 0.8.
Original graph |
---|
--addEdge | --edgeConnection | Image | --addEdge | --edgeConnection | Image | |
---|---|---|---|---|---|---|
0.05 | 0.75 | 0.05 | 0.50 | |||
0.10 | 0.75 | 0.10 | 0.50 | |||
0.25 | 0.75 | 0.25 | 0.50 | |||
0.50 | 0.75 | 0.50 | 0.50 |
Remove edge
Controlled by --rmEdge parameter, removes randomly edges with probability --rmEdge, however, at least one edge shall be kept.
Some examples with graph A where --nodeThreshold has been set to 0.40, --nodeDisplace 0.10, --edgeMaximum 10, --nodeAdd 0.8, --addEdge 0.1 and --edgeConnection 0.75.
Original graph |
---|
--rmEdge | Image | --rmEdge | Image | |
---|---|---|---|---|
0.01 | 0.05 | |||
0.10 | 0.20 |
Some Examples
Different levels of distortion for graph A with --nodeThreshold 0.4.
LOW
- --nodeDisplace 0.05
- --nodeAdd 0.4
- --edgeMaximum 8
- --addEdge 0.1
- --rmEdge 0.05
- --edgeConnection 0.75
Image | Image | Image |
---|---|---|
MEDIUM
- --nodeDisplace 0.1
- --nodeAdd 0.5
- --edgeMaximum 10
- --addEdge 0.1
- --rmEdge 0.05
- --edgeConnection 0.6
Image | Image | Image |
---|---|---|
HIGH
- --nodeDisplace 0.2
- --nodeAdd 0.8
- --edgeMaximum 10
- --addEdge 0.25
- --rmEdge 0.05
- --edgeConnection 0.6
Image | Image | Image |
---|---|---|