We propose a novel technique of vertex encoding for edge nonexistence determination(VEND, for short) to avoid no-result edge query executions , which will certainly improve performance of graph database and accelerate related graph searches.
Linux
graph storage is based on a database, so firstly make sure that the environment supports
-
create a static library of rocksdb
$ git clone https://github.com/facebook/rocksdb $ make static_lib
-
modify the dataset information
the basic properties of the testing dataset are available on /src/include/common/config.h
// src/include/common/config.h static constexpr int K_SIZE = 5; static constexpr uint32_t VERTEX_SIZE = 1791489; static constexpr uint32_t DEGREE = 28; make sure to correct the dataset details before testing
-
build and run
Then run the following commands to build
$ mkdir build $ cd build $ cmake .. $ make
example scripts are available on /scripts/execute.sh
the following is properties of the scripts, you can modify them as you wish
param_k=5 data_path="../resource/as-skitter.txt" vend_prefix="" // where the encode is stored output_path="" // result path pair_path_prefix="" // query and insert edges location
Next, run the script to start the vend
db engine
Provide the database interface, you can rewrite it to build the graph based on another database rather than rocksdb.
dbengine/dbengin.h dbengine/rocksdb.h
encode
Constructing the Vertex encoding for the graph, which contains four algorithms,
- range based encoding /encode/single/range_encode.h
- hash based encoding /encode/single/hash_encode.h
- bloom filter bit encoding /encode/bfilter/bfilter_bit_encode.h
- bloom filter int encoding /encode/bfilter/bfilter_int_encode.h
- hybrid encoding /encode/single/hybrida_encode.h
vend
build the encoding and test one query edge is whether a ne-pair(edge nonexistence) or not
graph
build the whole graph, and provides three operations: query, insertion, and deletion
execution
1. score: return the number of certain ne-pairs under each encoding algorithm when querying.
2. query : return the query cost time
3. delete : return the time taken for encoding deletion
4. insert : return the time taken for encoding insertion
utils
contains two major classes: bitset and timer
utils/bitset.h // the basic element for encoding
utils/timer // provides timer function for each execution
We provide 3 datasets in our experiment
Summary of Datasets
Datasets | Type | Vertexes | Edges | Average Degree |
---|---|---|---|---|
As-Skitter | Internet topology | 1,696,415 | 11,095,298 | 13 |
Wiki-topcats | Wikipedia hyperlink | 1,791,489 | 25,444,207 | 28 |
Orkut | Social network | 3,072,441 | 117,185,083 | 76 |
======= |
-
As-Skitter dataset [link]: http://snap.stanford.edu/data/wiki-topcats.html
-
Wiki-topcats dataset [link]:http://snap.stanford.edu/data/wiki-topcats.html
-
Orkut dataset [link]: http://snap.stanford.edu/data/wiki-topcats.html