/deepcommunitydetection

PyDCD: A Deep Learning-Based Community Detection Software in Python for Large-scale Networks

Primary LanguagePython

PyDCD: A Deep Learning-Based Community Detection Software in Python for Large-scale Networks

DCD (Deep learning-based Community Detection) is designed to apply state-of-the-art deep learning technologies to identify communities for large-scale networks. Compared with existing community detection methods, DCD offers a unified solution for many variations of community detection problems.

DCD logo

DCD provides implementation of 4 community detection algorithms, 1 evaluation, and two types of networked data:

Function Description Input Output
K-Means Baseline (1) -Network node file
-Network edge file
-Performance evaluation flag
-K
<node id, community id>
MM Baseline (2) -Network node file
-Network edge file
<node id, community id>
DCD DCD -Network node file
-Network edge file
-Performance evaluation flag
-Node attribute flag
-K
<node id, community id>
Random network Generation Generate random network datasets -Network size
-Community size
-Probability of edges within communities
-Probability of edges between communities
-Directed network flag
<node id, community id>
Network node file
Network edge file
Load Dataset Load Facebook, citation or user-provided datasets Dataset name Facebook dataset
Citation dataset

Requirements

Generally, the library is compatible with Python 3.6/3.7.
NetworkX >= 2.3

Installation

From PIP

pip3 install pydcd

Quick Start

Here is a quick-start example.

Python 3.7.3 (default, January 01 2020, 09:00:00) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> from pydcd import DCD, KM, MM
>>> kmeans_detector = KM(10)
>>> kmeans_detector.km_detect_community('fb_nodes.txt','fb_edges.txt','N') # N means no evaluation

>>> mm_detector = MM()
>>> mm_detector.mm_detect_community('fb_nodes.txt','fb_edges.txt','Y') # Y means showing evaluation

>>> dcd_detector = DCD() # using default setting for initialization, or
>>> dcd_detector = DCD(128,64,128,50) # set the neurons for three hidden layers and the output dimension
>>> dcd_detector.dcd_detect_community('fb_nodes_withattributes.txt','fb_edges.txt','Y','N') # Y means nodes having attributes
>>> dcd_detector.dcd_detect_community('fb_nodes_noattributes.txt','fb_edges.txt','N','N') # The first N means nodes no attributes

>>> rn = RandNet() # to generate random networks
>>> rn.generate_random_networks(1000,100,0.2,0.05) # undirected network with 1000 nodes and 100 communities
>>> rn.generate_random_networks(1000,100,0.2,0.05,directed=True) # directed network with 1000 nodes and 100 communities

Input Examples

node file without attributes:

node_id_1
node_id_2
node_id_3
...
node_id_n

node file with attributes:

node_id_1 <tab> value_for_attribute_1 value_for_attribute_2 ... value_for_attribute_m
node_id_2 <tab> value_for_attribute_1 value_for_attribute_2 ... value_for_attribute_m
node_id_3 <tab> value_for_attribute_1 value_for_attribute_2 ... value_for_attribute_m
...
node_id_n <tab> value_for_attribute_1 value_for_attribute_2 ... value_for_attribute_m

edge file:

node_id_1 node_id_2
...
node_id_i node_id_j
...
node_id_m node_id_k

Development Team

PyDCD is developed by Prof. Kunpeng Zhang, Prof. Shaokun Fan, and Prof. Bruce Golden.

Citation

If you find this useful for your research or development, please cite our work.