TSeqE

This repo provides a reference implementation of TSeqE as described in the paper:

J. Yang, W. Zhou, W. Qian, J. Han and S. Hu, "Topic Sequence Embedding for User Identity Linkage from Heterogeneous Behavior Data," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 2590-2594.

Basic Usage

Code reference

CONFIG.py：实验参数配置。
embedding_learner.py：话题表示学习模块，提供Embedding_Learner类，成员函数fit()为训练函数入口。
zhihu/ml_data_preprocess.py：提供数据预处理功能。
zhihu/ml_main.py：实验入口程序，包含数据集分割、训练、测试。
process_pool.py：多进程模块。
validation.py：提供测试中的距离、准确率计算功能。

Run the code

运行前需要将数据与代码组织成如下结构：

由于这是后期帮忙整理出的代码，没有全部跑过，不能保证跑通。

cd ./TSeq

# processing zhihu dataset information 
python zhihu_data_preprocess.py 

# run the model on zhihu_dataset 
python zhihu_main.py 

# processing MovieLens dataset information 
python ml_data_preprocess.py 

# run the model on MovieLens dataset 
python ml_main.py

Datasets

the datasets could be found in the following links:

TSeqE_data

Cite

If you find TSeqE useful for your research, please consider citing us :