h324yang/multiprobe

Probing multilingual neural networks.

PythonMIT

Multiprob

A tool for generating massive parallel corpus with Wikidata.

Steps

Extract descriptions

python extract_top_desc.py

Generate parallel triples (center_sent, pos_sent, neg_sent), e.g., this. By default, it randomly picks three languages to generate triples each time. If you need to fix the center language, change the constant FIXED_CENTER_LANG inside to a specific language, e.g., "en"

python gen_train.py