Coref for large documents
ashim95 opened this issue · 1 comments
Hi @kentonl ,
I have been trying to use your model and it works pretty well for short documents (n_sentences ~50-70). For large documents (ex: n_sentences ~ 250-300) I am unable to train the system even after reducing a lot of other hyperparameters (max-span-width, num_antecedents etc.), getting OOM on a 16GB P100 GPU.
Do you have any insights on how I can deal with this issue?
Unfortunately, the model is quite memory intensive, and reducing this to enable long-document coref is a still an open research question.
For now, I would recommend leaving the training procedure as is. At test time, you can break the documents into overlapping chunks and stitch the predicted clusters together outside of TensorFlow.