mfbn: A Python repository from alanvalejo

References

[1] Valejo, Alan and Faleiros, T. P. and Oliveira, Maria C. F. and Lopes, A. A., A coarsening method for bipartite networks via weight-constrained label propagation, in Knowledge-based systems, p. 105678, vol. 195, 2019, doi: https://doi.org/10.1016/j.knosys.2020.105678

[2] Valejo, Alan and Oliveira, Maria C. F. and Filho, Geraldo P. R. and Lopes, A. A., Multilevel approach for combinatorial optimization in bipartite network, in Knowledge-based systems, p. 45--61, vol. 151, 2018, doi: https://doi.org/10.1016/j.knosys.2018.03.021

[3] Valejo, Alan and Ferreira, V. and Oliveira, Maria C. F. and Lopes, A. A., Community detection in bipartite network: a modified coarsening approach, in International symposium on information management and big data (SIMBig), track on social network and media analysis and mining (SNMAN). Part of the Communications in Computer and Information Science book series (CCIS, volume 795), p. 123--136, 2017, doi: https://doi.org/10.1007/978-3-319-90596-9_9

[4] Valejo, Alan and Filho, Geraldo P. R. and Oliveira, Maria C. F. and Ferreira, V. and Lopes, A. A., One-mode projection-based multilevel approach for community detection in bipartite networks, in International symposium on information management and big data (SIMBig), track on social network and media analysis and mining (SNMAN), p. 101-108, 2017 , doi: http://ceur-ws.org/Vol-2029/paper8.pdf

[5] Valejo, Alan and Ferreira, V. and Fabbri, R. and Oliveira, Maria C. F. and Lopes, A. A., A critical survey of the multilevel method in complex networks, in ACM Computing Surveys, accepted paper, 2019

Warning
-------
The original implementations (i.e. paper versions [1,2,3,4]) are deprecated.
There may be divergences between this version and the original algorithms.
If you looking for the original version used in the paper don't hesitate to contact the authors.

This software is a new version, more robust and fast.
It is a beta version and has some bugs and inconsistencies.
The final version of this tool will be released is coming soon.
For now, you can use this version without guarantee of the results.

MFBN: Multilevel framework for bipartite networks

About

This is a framework that compiles several coarsening algorithms for bipartite networks, specifically: OPM [3,4], RGMB [2], GMB [2] and MLPb [1].

A multilevel method is a scalable strategy to solve optimization problems in large bipartite networks, which operates in three stages. Initially the input network is iteratively coarsened into a hierarchy of gradually smaller networks. Coarsening implies in collapsing vertices into so-called super-vertices which inherit properties of their originating vertices. An initial solution is obtained executing the target algorithm in the coarsest network. Finally, this solution is successively projected back over the inverse sequence of coarsened networks, up to the initial one, yielding an approximate final solution.

An extensive critical survey of the multilevel method in complex networks is presented in [5].

Usage

MFBN may operate in two modes: 1. using explicit command line parameters (or options) or 2. using a JSON config file (JavaScript Object Notation).

Command line parameters

$ python mfbn.py [options]

Option	Domain	Default	Description	Algorithms
-dir --output_directory	str [DIR]	'.'	directory of output file	All
-out --output	str [FILE]	'out'	filename	All
-cnf --conf	str [FILE]	None	Input parameters in .json format	All
-o --output_object	boolean	False	return python objects dictionary and don't write files	All
-v --vertices	int array [L1,L2]	None	number of vertices for each layer	All
-r --reduction_factor	int array [L1,L2]	[0.5, 0.5]	reduction factor for each layer	All
-m --max_levels	int array [L1,L2]	[3, 3]	max levels for each layer	OPM, RGMB and GMB
-gmv --global_min_vertices	int array [L1,L2]	[100, 100]	minimum number of vertices for each layer in the last level	MLPb
-t --tolerance	int array [L1,L2]	[0.1]	tolerance in for each layer	MLPb
-i --itr	int array [L1,L2]	[10, 10]	number of iterations for each layer	MLPb
-ub --upper_bound	int array [L1,L2]	[0.2, 0.2]	upper bound for each layer	MLPb
-c --matching	str array [L1,L2]	["gmb", "gmb"]	matching method for each layer	Algorithm selection
-s --similarity	str array [L1,L2]	["jaccard", "jaccard"]	similarity measure for each layer	All
-sd --seed_priority	str array [L1,L2]	["degree", "degree"]	seed priority to start the algorithms	All
-scnf --save_conf	boolean	false	save config file	All
-sgml --save_gml	boolean	false	save gml file	All
-sn --save_ncol	boolean	false	save ncol file	All
-ssrc --save_source	boolean	false	save source file	All
-smbs --save_membership	boolean	false	save membership file	All
-swgh --save_weight	boolean	false	save weight file	All
-sprd --save_predecessor	boolean	false	save predecessor file	All
-sscc --save_successor	boolean	false	save successor file	All
-shrr --save_hierarchy	boolean	false	save hierarchy of networks	All
-sc --show_conf	boolean	false	show conf file	All
-st --show_timing	boolean	False	show timing	All
-tcsv --save_timing_csv	boolean	False	save timing in csv	All
-tjson --save_timing_json	boolean	False	save timing in json	All
--unique_key	boolean	False	output date and time as unique_key	All

JSON option

$ python mfbn.py -cnf options.json

JSON format: Data is in name/value pairs, separated by commas, curly braces hold objects and square brackets hold arrays.

{
    "option": "value"
}

The matching strategy selects the best pairs of vertices for matching. In this software it is possible use six matching methods:

OPM_hem [3,4]: hem key
OPM_lem [3,4]: lem key
OPM_rm [3,4]: rm key
RGMb [2]: rgmb key
GMb [2]: gmb key
MLPb [1]: mlpb key

In this software it is possible use eleven similarity measures:

Common Neighbors: common_neighbors key
Weighted Common Neighbors: weighted_common_neighbors key
Preferential Attachment: preferential_attachment key
Jaccard: jaccard key
Salton: salton key
Adamic Adar: adamic_adar key
Resource Allocation: resource_allocation key
Sorensen: sorensen key
Hub Promoted: hub_promoted key
Hub Depressed: hub_depressed key
Leicht Holme Newman: leicht_holme_newman key

In this software it is possible use three seeds priorities:

Degree: degree key
Random: random key
Strength: strength key

It is possible use different parameters algorithms in each layer, e.g.:

{
    "reduction_factor": [0.3, 0.5],
    "matching": ["gmb", "rgmb"],
    "similarity": ["common_neighbors", "jaccard"]
    ...
}

Example

To help you visualize the networks generated by the coarsening process you can use the PyNetViewer software. In the network images, line widths reflect the corresponding edge weights; vertex widths depict the super-vertex weights. Bright red markers denote high-degree vertices.

We tested 3 real networks: 1. Moreno Crime is a social network with scale-free behavior relating 754 crime suspects or victims with 509 crimes (Figure 1a); 2. N-reactome is a biological network with modular topology with 8,788 proteins and 15,433 interactions in the species Homo Sapiens (Figure 2a); 3. Rajat is a nonsymmetric circuit simulation matrix, in which 6,765 individual resistors and connection points are defined as edges and vertices, respectively (Figure 3a). The two first networks are available in the Koblenz Network Collection (KONECT) and the last is available from the SuitSparse Matrix Collection.

We use the MLPb algorithm [1] to create a coarsened versions of the previous mentioned networks. Particularly, we generated coarsened version with relatively balanced super-vertices (between 0.2 and 0.4 upper bound) and we used weighted common neighbor similarity measure.

Moreno Crime network (original size: V1=754; V2=509)

# Reduce the network to V1=450; V2=350
$ python coarsening.py -cnf input/moreno-1.json

# Reduce the network to V1=250; V2=150
$ python coarsening.py -cnf input/moreno-2.json

# Reduce the network to V1=50; V2=100
$ python coarsening.py -cnf input/moreno-3.json

The result is showed below, Figures 1b, 1c and 1d.

1a) V1=754; V2=509	1b) V1=450; V2=350	1c) V1=250; V2=150	1d) V1=50; V2=100

N-reactome network (original size: V1=8,788; V2=15,433)

# Reduce the network to V1=1000; V2=4000
$ python coarsening.py -cnf input/n-reactome-1.json

# Reduce the network to V1=500; V2=1000
$ python coarsening.py -cnf input/n-reactome-2.json

# Reduce the network to V1=200; V2=500
$ python coarsening.py -cnf input/n-reactome-3.json

The result is showed below, Figures 2b, 2c and 2d.

2a) V1=8788; V2=15433	2b) V1=1000; V2=4000	2c) V1=500; V2=1000	2d) V1=200; V2=500

Rajat network (original size: V1=6,765; V2=6,765)

# Reduce the network to V1=V2=3000
$ python coarsening.py -cnf input/rajat-1.json

# Reduce the network to V1=V2=500
$ python coarsening.py -cnf input/rajat-2.json

# Reduce the network to V1=V2=150
$ python coarsening.py -cnf input/rajat-3.json

The result is showed below, Figures 3b, 3c and 3d.

3a) V1=V2=6765	3b) V1=V2=3000	3c) V1=V2=500	3d) V1=V2=150

MLPb algorithm can generate balanced and unbalanced super-vertices using an upper bound parameter:

4a) upper_bound=0.1	4b) upper_bound=0.8

$ python coarsening.py -cnf input/rajat-1.json

Timing

To track the execution time of the algorithms you can use the fallowing parameters:

{
    "show_timing": true,
    "save_timing_csv": false,
    "save_timing_json": false
}

E.g., show_timing track (and show in bash) the execution time of each algorithm phase:

$ python coarsening.py -cnf input/n-reactome-1.json

       Snippet       Time [m]       Time [s]
Pre-processing            0.0         0.0428
          Load            0.0         0.2623
    Coarsening            0.0         4.9760
          Save            0.0         0.1249

Instal

Pip

$ pip install -r /path/to/requirements.txt

Anaconda env

$ conda env create -f environment.yml
$ conda activate mfbn

Anaconda create

$ conda create --name mfbn python=3.7.2
$ conda activate mfbn
$ conda install -c anaconda numpy
$ conda install -c conda-forge python-igraph
$ conda install -c anaconda pyyaml
$ conda install -c conda-forge pypdf2
$ conda install -c anaconda scipy
$ conda install -c anaconda networkx
$ conda install -c bioconda sharedmem

Release History

0.1.0
- The first proper release
0.0.1
- Work in progress

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.

Fork it (https://github.com/alanvalejo/mfbn/fork)
Create your feature branch (git checkout -b feature/fooBar)
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Create a new Pull Request

Known Bugs

Please contact the author for problems and bug report.

Contact

Alan Valejo
alanvalejo@gmail.com
www.alanvalejo.com.br
Postdoctoral research fellow at the University of São Paulo (USP), Brazil

License and credits

Giving credit to the author by citing the papers [1,2,3,4]
The GNU General Public License v3.0
This program comes with ABSOLUTELY NO WARRANTY. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.
Owner or contributors are not liable for any direct, indirect, incidental, special, exemplary, or consequential damages, (such as loss of data or profits, and others) arising in any way out of the use of this software, even if advised of the possibility of such damage.
This program is free software and distributed in the hope that it will be useful: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

To-do list

Implements the output of the objects

alanvalejo/mfbn

MFBN: Multilevel framework for bipartite networks