Removing-Duplicate-Docs-Using-Hashing-in-Python

  1. The program will take the path of the input folder which you need to remove duplicates. Once the program runs successfully, the output will be the same input folder without duplicates.
  2. It has two python files;
    1. If you want to use the sequential implementation only use pgm_main.py
    2. if you want to use Parallel processing, Only use parallel_processing_pgm_main.py
  3. USER changes:
    1. Just update the input_files_path variable with the path of input folder
    2. If you are using multiprocessing, you can also update the number of processes in the file. The default value I used is 4