ChenghaoMou/text-dedup

NameError: name 'uf' is not defined

duytran1332002 opened this issue · 5 comments

I get it
image
How to solve it?

This is most likely due to windows having a different multiprocessing implementation than macOS or linux. Unfortunately, I don't have a Windows machine to debug this, so I would suggest running the scripts under the linux subsystem or in a virtual machine.

Yes, thank you. Now I want to run minhash_spark.py in my local spark database. How to do that?

The spark script doesn't care about where it runs, locally or remotely. You should be able to submit a job by running spark-submit minhash_spark.py.

this mean I can use your quick start to run in locally. Can you explain clearly how to run it in my spark local database.

You should be able to find tutorials on how to submit a job to spark, locally or not, on the Internet. I can't offer much help without knowing your hardware, spark set-up, data set, what you have tried, and what exact issues you had. This is the part where you will have to figure it out by yourself.