Out of memory on Spark

Question

Out of memory on Spark

311dada opened this issue a year ago · 1 comments

Thanks for your great work first!

Recently, I deduplicate some Chinese books (~2000 books with 10G). I adopt the jieba tokenizer while the Spark throws out of memory error at the groupby statement. I increase the executor memory to 65G while not working. Could you help me with where the memory costs most? THX

Answer 1 · 2023-05-30T15:38:38.000Z

Sry to disturb you. I found the issue is caused by the long document. Fixed!