you can ingore the file called "HSP/src",which is unuseful;
core code is entity_profiling-master
具体包含两个部分的内容: It contains two parts
Part1(java,内部含有相关的模块函数的具体说明,关键部分有以下几小点):
There are specific instructions for related module functions inside:
- 1、数据集处理的操作,解析RDF数据集,按照一定规则将其插入sqlite表保存;
- 2、区间离散化操作;
- 3、过滤标签;
- 4、后续得到实体的向量结果后进行cos计算操作,估量标签;
- 5、排序得到标签结果;
=================
- operations of processing data sets: parse the RDF data set, and save it into the sqlite table according to certain rules;
- Interval discretization operation;
- Filter labels;
- perform the cos calculation operation to estimate the label after generating the vector result of the entity, ;
- Sort the label result and obtain the result;
Part2 (python关键部分有以下几小点):
- 1、H路径:H-path 用于进行基于同质性的路径生成,单独的H方法生成实体向量;
- 2、A路径:A-path 用于进行基于属性相似性的路径生成,以及单独的A方法生成实体向量;
- 3、S路径:S-path 用于进行基于结构相似性的路径生成,以及单独的S方法生成实体向量;
- 4、HAS联合路径:HAS_combine_seq.py 该文件用于进行将所有实体节点的路进行融合,而后使用语言模型的方式;
- 5、findneighbors.py:用于找cube中的邻居;
- 6、其余的process用于处理sqlite表中数据信息,划分为了string类型标签和数值类型标签;
==================
- H path:generating H-path based on homogeneity, and H method generates entity vectors;
- A path:generating A-path based on attribute similarity, and A method to generate entity vectors;
- S path:generating S-path based on structural similarity, and S method to generate entity vectors;
- HAS combine path: HAS_combine_seq.py This file is used to merge the paths of all entities, and then use the language model;
- findneighbors.py: used to find neighbors in the cube of a entity;
- The rest of the files is used to process the data information in the sqlite table.For example,dividing labels into string type labels and numeric type labels;
本方法针对rdf/n3/nt...等一些数据格式的数据集进行的数据解析存储操作,其他训练相关相关信息可参见deepwalk;(HAS ->HSP,change the name) This method performs data analysis and storage operations for some data formats such as rdf/n3/nt..., and other training related information can be found in deepwalk https://github.com/phanein/deepwalk