This is a repo of CNPMI that evalutes the coherence and alignment of cross-lingual topics.
scikit-learn==1.0.2
pyyaml==6.0
We use WikiComp to generate parallel documents as reference copora.
In ./ref_corpus
, we include the reference corpus for English & Chinese and English & Japanese.
Run the following command to compute the CNPMI of cross-lingual topics:
python CNPMI.py \
--topics1 {path to topics of languge 1} \
--topics2 {path to topics of languge 2} \
--ref_corpus_config ./configs/ref_corpus/{lang1_lang2}.yaml
Our code is based on
Lessons from the Bible on Modern Topics:Low-Resource Multilingual Topic Model Evaluation