I copy and paste text from PDFs a lot, and this utility helps do some basic cleaning of the copied text to make it more palatable! Just copy the text section from the PDF, run python clean.py
, and the new clean text will be in your clipboard and printed out to the terminal.
Clone the repo and then run pip install -r requirements.txt
Recommended workflow: add an alias in .bashrc
that changes the directory to this repo, (optionally activates a conda environment), and then runs python clean.py
. Currently I can open a terminal, run the alias, and get the cleaned text in about 5-6 seconds.
- Have it run in the background so that you don't have to wait for the dictionary to load every time
- Call from command line (improve workflow)
- Throw onto a website
- Just use one dictionary (currently it uses two different wordsets)
I use the English words repo for one dictionary and pyspellchecker for another.