Mdr (Multilevel dimensionality reduction)

About

This is an Python implementation of multilevel dimensionality reduction, published by Alan et. al. [1]. The implementation is based on multilevel approach for combinatorial optimization that can perform in bipartite context.

Download

Usage

Coarsening by levels. The user defines the number of levels and the reduction factor at each level.

$ python mdr-mob.py [options]
$ python mdr-opm.py [options]
Option Domain Default Description
-in --input string [FILE] None name of the input file to be loaded
-dir --directory string [DIR] '.' directory of output file
-out --output string [FILE] filename name of the file to be save
-cnf --conf string [FILE] None name of the config file to be loaded
-v --vertices int None number of vertices for each layer
-r --rf (0 0.5] [0.5 0.5] reduction factor for each layer
-m --ml [0 n] [3 3] max levels for each layer
-c --matching string [hem hem] matching method for each layer
-s --similarity string None similarity measure for each layer
--save_conf boolean false save config file
--save_ncol boolean false save ncol format
--save_gml boolean false save gml format
--save_source boolean false save source reference
--save_predecessor boolean false save predecessor reference
--save_successor boolean false save successor reference
--save_hierarchy boolean false save all levels of hierarchy
--save_adjacency boolean false save weighted adjacency matrix
--save_timing_json boolean False save timing in json
--save_timing_csv boolean False save timing in csv
--show_timing boolean false show timing
--unique_key boolean false output date and time as unique_key

The matching strategy selects the best pairs of vertices for matching. Formally, a matching $M$ can be denoted by a set of pairwise non-adjacent edges, i.e., a set of edges with no common vertices. In this software it is possible use two matching methods:

For mdr-mob.py use:

  • Greed Rand Twohopes
  • Greed Twohopes

For mdr-opm.py use:

  • Hem
  • Lem
  • Rm

The matching strategy is, therefore, a key component of an effective multilevel optimization, as it leads to a good hierarchy of coarsened networks for supporting the local search algorithm. A poor choice of the matching impairs the multilevel process, hence, the performance of the local search algorithm. In this software it is possible use some similarity measures:

For both scripts, mdr-mob.py and mdr-opm.py you can use:

  • Common Neighbors
  • Weighted Common Neighbors
  • Preferential Attachment
  • Jaccard
  • Salton
  • Adamic Adar
  • Resource Allocation
  • Sorensen
  • Hub Promoted
  • Hub Depressed
  • Leicht Holme Newman

Quick benchmark results

We test a scientific collaboration network (Cond-Mat), available here, which is based on preprints posted in the Condensed Matter section (arXiv) between 1995 and 1999 and has 38.742 vertices (authors and papers) and 58.595 edges (authorship) among different types of vertices.

# Mdr using Greed Rand Twohopes coarsening algorithm (Execute more time to find the best result)
$ python mdr-mob.py -cnf input/iris-mdr-rgmb.json; python plot2Dmdr.py iris

# Mdr using Greed Twohopes coarsening algorithm
$ python mdr-mob.py -cnf input/wine-mdr-gmb.json; python plot2Dmdr.py wine
$ python mdr-mob.py -cnf input/breast-cancer-mdr-gmb.json; python plot2Dmdr.py breast-cancer

# Principal component analysis (PCA)
$ python plot2D.py pca iris
$ python plot2D.py pca wine
$ python plot2D.py pca breast-cancer

# Feature Agglomeration
$ python plot2D.py fa iris
$ python plot2D.py fa wine
$ python plot2D.py fa breast-cancer

# Feature Agglomeration
$ python plot2D.py lsa iris
$ python plot2D.py lsa wine
$ python plot2D.py lsa breast-cancer

Iris

Mdr Principal component analysis (PCA) Feature Agglomeration Truncated SVD (aka LSA)

Wine

Mdr Principal component analysis (PCA) Feature Agglomeration Truncated SVD (aka LSA)

Breast Cancer

Mdr Principal component analysis (PCA) Feature Agglomeration Truncated SVD (aka LSA)

Dependencies

Known Bugs

Please contact the author for problems and bug report.

Contact

License (COPYING.md)

  • Can be used for creating unlimited applications
  • Can be distributed in binary or object form only
  • Non-commercial use only
  • Can modify source-code and distribute modifications (derivative works)
  • Giving credit to the author by citing the papers [1,2]
  • License will expire in 2018, July, and will be renewed.

References

[1] Valejo, A.; Rocha Filho, G. P.; Oliveira, M. C. F.; and Lopes, A. A.: A Multilevel approach for combinatorial optimization in bipartite networks. Knowledge-Based Systems (2018)

@article{Alan_2014,
    author={Valejo, A.; Rocha Filho, G. P.; Oliveira, M. C. F.; and Lopes, A. A.},
    title={Multilevel approach for combinatorial optimization in bipartite networks},
    journal={Knowledge-Based Systems},
    year={2018},
}
© Copyright (C) 2017 Alan Valejo <alanvalejo@gmail.com> All rights reserved.