kentonl/e2e-coref

Coref for large documents

ashim95 opened this issue · 1 comments

Hi @kentonl ,

I have been trying to use your model and it works pretty well for short documents (n_sentences ~50-70). For large documents (ex: n_sentences ~ 250-300) I am unable to train the system even after reducing a lot of other hyperparameters (max-span-width, num_antecedents etc.), getting OOM on a 16GB P100 GPU.

Do you have any insights on how I can deal with this issue?

Unfortunately, the model is quite memory intensive, and reducing this to enable long-document coref is a still an open research question.

For now, I would recommend leaving the training procedure as is. At test time, you can break the documents into overlapping chunks and stitch the predicted clusters together outside of TensorFlow.