Question about training data
Closed this issue · 1 comments
kato8966 commented
The paper says "To train embeddings, we use domain-matched data for each downstream task. For coreference we train on Wikipedia data, ..." in section 4.1 Datasets.
However, README.md says,
Data
English Coreference:
We pretrain embeddings on the English gigaword corpus.
Which is right?
seraphinatarrant commented
Oh sorry, the readme was out of date from our initial experiments, good catch! We use wikipedia as the paper says. I have updated and corrected the readme.