/AWOE

Academic Word Embedding

Primary LanguagePythonMIT LicenseMIT

AWOE

Academic WOrd Embeddings based on AMiner 2 billion publication data and gensim and their applications.

Dependencies

  • Python 3
  • gensim
  • spherecluster

Overview

Pre-trained Models

  • English Paper Keywords (EPK): Download
  • Chinese Paper Keywords (CPK): Download
  • Bilingual Transformation Matrix: Download

For details for these models, see docs/word2vec.md. (If you just want to use these models, ignore them.)

We hvae prepared a download bash script for you, you can use it on your need. For example, if you only need Chinese, just run ./download.sh zh.

chmod +x download.sh
./download.sh zh
./download.sh en
wget https://lfs.aminer.cn/misc/awoe/W_en2zh.pkl -P tmp/

Utils

We provide some utils to use the above models, including tokenization, keyword extraction, sentense to vector, etc. Here are some use examples.

Before using these modules, download the required models first.

Mono-lingual

Docs to complete. You can run test.py for now.

Bi-lingual

Docs to complete.

Citation

If our work helps you in some way, please consider citing the following publication(s):

  • Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’2008).