/TSeqE

This is a reference implementation of the paper "Topic Sequence Embedding for User Identity Linkage from Heterogeneous Behavior Data"

Primary LanguagePython

TSeqE

This repo provides a reference implementation of TSeqE as described in the paper:

J. Yang, W. Zhou, W. Qian, J. Han and S. Hu, "Topic Sequence Embedding for User Identity Linkage from Heterogeneous Behavior Data," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 2590-2594.

Basic Usage

Code reference

  • CONFIG.py:实验参数配置。
  • embedding_learner.py:话题表示学习模块,提供Embedding_Learner类,成员函数fit()为训练函数入口。
  • zhihu/ml_data_preprocess.py:提供数据预处理功能。
  • zhihu/ml_main.py:实验入口程序,包含数据集分割、训练、测试。
  • process_pool.py:多进程模块。
  • validation.py:提供测试中的距离、准确率计算功能。

Run the code

运行前需要将数据与代码组织成如下结构:

由于这是后期帮忙整理出的代码,没有全部跑过,不能保证跑通。

cd ./TSeq

# processing zhihu dataset information 
python zhihu_data_preprocess.py 

# run the model on zhihu_dataset 
python zhihu_main.py 

# processing MovieLens dataset information 
python ml_data_preprocess.py 

# run the model on MovieLens dataset 
python ml_main.py

Datasets

the datasets could be found in the following links:

TSeqE_data

Cite

If you find TSeqE useful for your research, please consider citing us :

@INPROCEEDINGS{TSeqE,
    author={Yang, Jinzhu and Zhou, Wei and Qian, Wanhui and Han, Jizhong and Hu, Songlin},
    booktitle={ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
    title={Topic Sequence Embedding for User Identity Linkage from     Heterogeneous Behavior Data}, 
    year={2021},
    volume={},
    number={},
    pages={2590-2594},
    doi={10.1109/ICASSP39728.2021.9415111}
}