An unofficial implementation of hyperdoc2vec (ACL 2018).
This repo also contains an example of papers and citations (check /data
folder). Indeed this is a "toy" example and you cannot expect meaningful results from it.
Since the authors did not release not only source codes but datasets, the correctness of this implementation cannot be checked. If you have any doubts or questions, please open an issue.
Implementation was carried out by using gensim, just like the authors did in the original paper.
I recommend poetry as python package manager (of course you can take alternative approach though).
For more detail in required packages, see pyproject.toml
.
After cloning, run the commands below.
poetry install
poetry run python -c "import nltk; nltk.download('punkt')"
poetry shell
PYTHONHASHSEED=42 python hd2v.py --retrofit=True
PYTHONHASHSEED
must be included if you want to get the reproducible results.
Option:
--retrofit
: If true, pv-dm retrofitting is enabled. Otherwise disabled (random init). Default value isTrue
.