Out of memory on Spark
311dada opened this issue · 1 comments
311dada commented
Thanks for your great work first!
Recently, I deduplicate some Chinese books (~2000 books with 10G). I adopt the jieba
tokenizer while the Spark throws out of memory error at the groupby statement. I increase the executor memory to 65G while not working. Could you help me with where the memory costs most? THX
311dada commented
Sry to disturb you. I found the issue is caused by the long document. Fixed!