sheffieldnlp/naacl2018-fever

Evaluation speed

Closed this issue · 1 comments

How long is evaluation supposed to take? I'm running the evidence retrieval step on 8 vCPUs , 16 GB RAM and an SSD, and for the dev set its projcecting almost 10 hours?

Is this expected?

j6mes commented

The retrieval step isn't the fastest: It's using Facebook's DrQA implementation that isn't really 'production ready'. I had to rewrite part of it so that it is data-parallel. More CPUs should help. See: https://github.com/sheffieldnlp/fever-naacl-2018/blob/master/src/scripts/retrieval/ir.py

It took about 7 hours on a 2014 macbook pro. So I'm curious to know why, with more CPUs, your implementation is slower. You could try hacking that with a high number of threads. But my suspicion is that while you have 8 vCPUs, you may not have exclusive access to 8 ALUs which would be required to compute the TF-IDF similarity scores.