This is the code repository for B-Link.
Python version: 2.7.11
Pandas version: 0.24.2
Networkx version: 1.11
The b-link souce data mainly contains two parts, i.e. (1) a networkx gpickle file representing the graph structure, and (2) a csv file representing the mapping between node id and the node text.
The networkx gpickle file can be downloaded here. It represents the main graph structure of B-Link.
To use the networkx graph:
>>> import networkx as nx
>>> G= nx.read_gpickle('addNodeEdgeDegree_R+rn+GMHM_undirected_alpha0.65_nodeD1.0_total_v3_csvneo4j.gpickle')
>>> G[1048581][343561]
{'weight': 2.0, 'R_r_HM': 85.00000000000001, 'maxAlpha': 0.8335867548824862, 'R_n_HM': 29120.999999999975, 'minAlpha': 0.7497914234110472, 'R_r_GM': 3.3576916931673404, 'R_n_GM': 10.279214842380336}
The code above check the edge between node(1048581) and node(343561). It shows the attributes attached with the edge. Some of these attributes (R_n_HM
, R_r_GM
, R_n_GM
) are used as edge distance in B-Link, which always try to provide the results with shortest distance.
- "weight": refers to the raw occurrence frequency of this two nodes in the publication in terms of sentence as well as keywords section.
- "R_n_HM": refers to a type of precomputed distance between this two nodes. This is used as the edge distance in the
Explore-General
function in B-Link - "R_r_GM": refers to a type of precomputed distance between this two nodes. This is used as the edge distance in the
Explore-Specific
andSearch Path - Professional
functions in B-Link. - "R_n_GM": refers to a type of precomputed distance between this two nodes. This is used as the edge distance in the
Search Path - Basic
function in B-Link.
Regarding how to compute the disance between nodes, please refer to paper [1]
The csv file represents the mapping between node id and node phrase/text. It can be downloaded here.
To check the mapping:
>>> import pandas as pd
>>> df = pd.read_csv('node_words.gz',compression='gzip')
>>> df
id word
0 2029856 !continuing medical education
1 1929261 !kung bushman(ju
2 1003006 #
3 3041529 #15μ10#δ6-desaturase
4 1133049 #60
5 2386300 #bis-hardness
6 1319240 #csp
7 2385232 #euromaidan
8 1800821 #fal
9 584050 #k-sat
10 580867 #p
11 1183304 #p complexity class
12 3054833 #p hard enumeration
...
The id
column is the id referred by the nodes in networkx graph, and the word
column is the phrase/text of the node.
[1] Shi, F., Chen, L., Han, J. and Childs, P., 2017. A data-driven text mining and semantic network analysis for design information retrieval. Journal of Mechanical Design, 139(11).