explosion/sense2vec

Prodigy and version of sense2vec - process is constantly killed

kuatroka opened this issue · 3 comments

Hi,
When I follow this tutorial on how to combine Prodigy and the 2019 version of Sense2vec

I constantly get CLI message "killed" with no further description on what to do to correct it. This only happens with the s2v_reddit_2019_lg/s2v_reddit_2019_lg version. The s2v_reddit_2015_md/s2v_old is working perfectly with the same parameters

In CLI I run
prodigy sense2vec.teach ner-client-dataset ./assets/s2v_reddit_2019_lg/s2v_reddit_2019_lg --seeds "Walmart, Apple"

and I get
Killed

When I use
prodigy sense2vec.teach ner-client-dataset ./assets/s2v_reddit_2015_md/s2v_old --seeds "Walmart, Apple"
all works fine

Thanks

Hey it gets killed most likey due to memory issues, the 2015 edition is just a gig, while the 2019 verson is 3.9gb in size alone. So there's a lot more of memory usage and when the resources get exhausted the system terminates the process.

I have the same problem! I have trained my own S2V, but as soon as I run it, it kill the kernel

This is essentially a RAM-related issue. You need lots of RAM. We were having the same problem and we tackled it using a dedicated server from Hetzner. They have some 512 GB RAM boxes in their "Server Auction" section which are pretty cost-effective.