Graph compression.
Note that this is a full reimplementation of PowerGrASP, taking advantage of ASP and Python lifting and simplifications. For the version published in 2017, see this repository.
python -m powergrasp mygraph.gml -o compressed.bbl
python -m powergrasp --help
import powergrasp
with open('compressed.bbl', 'w') as fd:
for line in powergrasp.compress_by_cc('mygraph.gml'):
fd.write(line + '\n')
Sorry, no technical doc for the moment.
The bubble file is the main output of PowerGrASP, and describes a power graph.
The bubble is a format handled by the CyOog plugin for Cytoscape 2, allowing one to load a bubble formatted file and to visualize the corresponding graph.
As a consequence, you have to install Cytoscape 2, and put the CyOog.jar
file into path/to/cytoscape/install/dir/plugins/
.
Another way to go is to convert the bubble to other formats handling hierarchical graphs, like gexf, dot or the API of cytoscape.js. This is a task implemented by bubbletools.
PowerGrASP has some configuration values,
that can be overwritten by a powergrasp.cfg
file placed in the working directory.
The configuration may be printed in stdout using:
python -m powergrasp --config
The config file may be either in ini or json format, as shown in test/powergrasp.oneshot.cfg
and test/powergrasp.manyoptions.cfg
.
Configuration allow user to define how much text will be outputed by powergrasp, and to tune the core compression and related optimizations.
See complete list in the options section.
pip install powergrasp
On random error, try adding --no-cache-dir
at the end of the command,
and check that you are using python 3.
You must have clingo
in your path. Depending on your OS, it might be done with a system installation,
or through downloading and (compilation and) manual installation.
- 8.17
- support for recipes options, like
breakable
orlast
- support for recipes options, like
- 8.11
- slight bounds improvements for non-star-biclique motif
- statistics/timers now provides time needed to search for each motif
- support for recipes, that small file that allows to define the compressions to do
- 8.10
- option parallel cc compression to enable parallel compression of connected components
- option bubble embeds cc, to put each cc in a dedicated powernode
- option bubble with simple edges, to keep or discard the simple edges from output
- statistics on time and compression metrics are computed by cc and for all graph. See new options.
- option cc statistic file, to retrieve stats and metrics for connected components.
- option global statistics, to compute statistics/metrics over all connected components.
- option bubble with statistics, to include stats and metrics in output bubble file
- motif type order option, allowing user to tune the order in which motifs are searched.
- parallel motif search option, making motifs search in different threads, and the overhaul compression much slower.
- CLI: --config flag to print the full configuration in stdout
- handling spaces instead of _ in fields names for INI format
- do not filter out nodes alone in their connected component (cf keep-single-nodes option)
- perf gain: the first motif to reach the best score is compressed, instead of the last
- improve handling of statistic files, that now contains cc idx and motif bounds
- expose
__version__
attribute in the package - add a list of available options in the README
- graph filtering: optimization leading to general performance gain when enabled
- improvements on {low,upp}erbounds computations
- 8.9
- many bugfixes
- CLINGO_OPTIONS can be a dict, mapping motif name to arbitrary parameters
- 8.8
- bugfix: lowerbound can't go under 2 (3 for cliques)
- clique search: lowerbound is initially computed as the maximal degree of node of clustering coefficient equals to 1
- use new versions of clyngor and bubbletools
- 8.7
- raise error when an invalid key is found in config file
- StarSearch is dedicated to find stars, simplifying BicliqueSearch's job (enabled by default, user may use Biclique instead of both Star and NonStarBiclique)
- specify a prefix for all (power) nodes with OUTPUT_NODE_PREFIX config field
- arbitrary options for clingo using CLINGO_OPTIONS config field
- bugfix on string options given in ini file, when unecessary quotes are kept
- 8.6
- bugfixes for INI format
- 8.5
- config may now be given in INI format
- improve logging
- config allow optimization target between memory or CPU (currently define a constraint implementation in ASP)
- bugfix about edges yielded by ASP and multithreading parameter
- no statistics file to write by default
- timer for output writing
- config allow user to define the number of CPU for clingo to use
- simplify ASP code by removing useless arg in block/4 atoms
- 8.4
- replace ASP code constraint to achieve much more efficient grounding
- configuration allow for improved lowerbound computation of bicliques
- generated bubble allow client to give heading comments
- handle keyboard interruption during search with grace
- implement timers and statistics recording
- enable multishot motif search by default
- various bugfixes
A recipe is a description of the motifs to compress. For instance, the data/recipe-test.txt:
biclique c g h i
biclique a b d e
biclique a b f
biclique c d e
Using the --recipe
option in CLI, user can provide a recipe for its own graph,
allowing to specify prior motifs to search.
The order of second and third columns does not have any influence.
The first column usually contains the motif types separated by comma (altough it is not necessary), and allows to provide some options (separated by comma):
primer
: allows the program to extend the given motif if possiblebreakable
: allows the program to compress the line with multiple motifsoptional
: if not found, ignore instead of stoppinglast
: stop compression here (useful to avoid the compression process to take over)
The recipe file will be read line by line.
Once all lines have been applied, the normal compression compression takes over
(unless last
option is used).
For a living example, see data/recipe-option-test.txt.
Description of all powergrasp options. All values example are given for INI format.
Run some integrity tests during runtime to ensure that compression is working well. May slow the compression a lot, and is mainly a debugging tool.
Default value:
test_integrity = true
Print in stdout the main steps of compression. Default value:
show_story = true
Print in stdout the motif transformation. Default value:
show_motif_handling = false
Maintain and print in stdout some timers. Default value:
timers = true
A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if timers option is enabled). Default value:
statistics_file = None
Example value:
statistics_file = statistics.csv
Filename in which the statistics about each connected component will be written. Data will contain one row for each connected component, with the following data:
- connected component index
- the convertion rate
- the edge reduction
- the compression rate
- time needed to compress the connected component (if TIMERS is enabled)
Default value stands for no file:
cc_statistic_file = None
Example value:
cc_statistic_file = statistics-cc.csv
When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression. Default value:
global_statistics = True
When enabled, output bubble will contain statistics about each connected component and the overall graph (if global_statistics is enabled). Default value:
bubble_with_statistics = True
Generate and save a bubble representation of the graph at each step. Mainly used for debugging. Default value:
bubble_for_each_step = false
Prefix to add to all (power)nodes names in output. Default value:
output_node_prefix = ''
Show full trace of the compression. Useful for debugging. Default value:
show_debug = False
Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker. Default value:
covered_edges_from_asp = False
Nodes and sets are optional in the output bubble. Default value:
bubble_with_nodes = True
bubble_with_sets = True
Edges in bubble are associated to a factor. Default value:
bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'
If enabled, put each connected component in a dedicated powernode. Default value:
bubble_embeds_cc = no
When possible, delete the quotes around identifiers in the bubble. May lead to node name collision. Default value:
bubble_simplify_quotes = True
If disabled, will discard simple (i.e. non-power) edges of output. Default value:
bubble_with_simple_edges = True
Load options found in given config file, if it exists. Default value:
config_file = 'powergrasp.cfg'
Search for multiple motif in a single search. Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable. Default value:
multishot_motif_search = True
Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered. Default value:
biclique_lowerbound_maxnei = 2
Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options). Default value:
clingo_options = {}
Give a particular configuration for bicliques search:
clingo_options = {'no-star-biclique': '--configuration=handy'}
Number of CPU available to clingo, or 0 for autodetect number of CPU. Default value:
clingo_multithreading = 1
Use as many thread there are available CPU:
clingo_multithreading = 0
Specify a competing search between 4 threads:
clingo_multithreading = '4,compete'
To use to compress connected components in different processes. Default value (optimized in memory):
parallel_cc_compression = 1
Compress with 4 processes :
parallel_cc_compression = 4
Compress with one process for each connected component :
parallel_cc_compression = 0
Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs. Default value:
use_star_motif = True
When possible, prefer memory over CPU. Default value:
optimize_for_memory = False
Ignore edges dynamically determined as impossible to compress. Default value:
graph_filtering = True
If a node is found to be alone in its connected component, it will be kept nonetheless. Default value:
keep_single_nodes = True
Use multithreading to search for all motifs in the same time, instead of sequentially. Interestingly, this yields very poor results, and it's even worse with multiprocessing.
parallel_motif_search = False
Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.
motif_type_order = star,clique,non-star-biclique,biclique
Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:
motif_type_order = greatest-upperbound-first
Or any variation with worst
, lowerbound
and last
.