seraphinatarrant/embedding_bias

Question about training data

Closed this issue · 1 comments

The paper says "To train embeddings, we use domain-matched data for each downstream task. For coreference we train on Wikipedia data, ..." in section 4.1 Datasets.
However, README.md says,

Data

English Coreference:

We pretrain embeddings on the English gigaword corpus.

Which is right?

Oh sorry, the readme was out of date from our initial experiments, good catch! We use wikipedia as the paper says. I have updated and corrected the readme.