Question about training data

Question

Question about training data

Closed this issue 2 years ago · 1 comments

The paper says "To train embeddings, we use domain-matched data for each downstream task. For coreference we train on Wikipedia data, ..." in section 4.1 Datasets.
However, README.md says,

Data

English Coreference:

We pretrain embeddings on the English gigaword corpus.

Which is right?

Answer 1 · 2022-09-20T14:36:38.000Z

Oh sorry, the readme was out of date from our initial experiments, good catch! We use wikipedia as the paper says. I have updated and corrected the readme.