stanfordnlp/GloVe

How to speed up for large dataset

linWujl opened this issue · 3 comments

Hello, my corpus is 700G, is there any way to speed up?

The coocur step has cost about 7500mins and it stills at the merge step.

Is it possible that use spark to construct the cooccurrence statistics and train it with tensorflow?

We did try converting it to torch at one point, but it wound up being significantly slower than the C version. We may try again sometime. You are welcome to try...

Do you have enough memory? Might be worth checking top