This is a Python script designed to analyze MinHash Signature Estimation Algorithm (MHSE).
- Description
- MinHash Signature Estimation Algorithm
- Measurements from the Collision Table
- How to analyze data
- License
This script allows you to analyze and fully reproduce our MHSE experiments.
MHSE is an algorithm to efficiently estimate the effective diameter and other distance metrics on very large graphs that are based on the neighborhood function such as the exact diameter, the (effective) radius or the average distance (more details) . Currently, we have published two version of the algorithm: the original one (MHSE), and the space efficient one (SE-MHSE) that, produces the same outcomes of MHSE but with less space complexity. SE-MHSE allows you to run this algorithm on machines with limited memory and also to easily parallelize it using any map-reduce framework. You can find our algorithm at the following link .
The algorithm outputs the following JSON:
{
"collisionsTable" :
"minHashNodeIDs" :
"numSeeds" :
"numNodes" :
"numArcs" :
"seedsTime" :
"lastHops" :
"time" :
"lowerBoundDiameter" :
"totalCouples" :
"totalCouplePercentage" :
"avgDistance" :
"effectiveDiameter" :
"algorithmName" :
"maxMemoryUsed" :
"seedsList" :
"threshold" :
"direction" :
"hopTable" :
}
If you execute the algorithm more than once, it will output a list of JSON:
[{
"collisionsTable" :
"minHashNodeIDs" :
.
.
.
"direction" :
"hopTable" :
},{
"collisionsTable" :
"minHashNodeIDs" :
.
.
.
"direction" :
"hopTable" :
},
.
.
.
]
You can set the same output file for all your executions of the MHSE and\or SEMHSE obtaining a list of JSON of all the exectutions (or you can also create the list of JSON as a second step after all the experiments). Given the JSON the script will automatically detect all the different parameters and group them all to calculate the statistics.