提取PDF中的文字到纯文本文件中
Extract all the words in a PDF file into a plain text file with the same file name.
Just use git
to clone a copy of this repository. That is using git clone https://github.com/xiyusullos/pdf2txt.git
in a terminal. Or press the Clone and download button and then press the Download ZIP button. Once downloaded, uncompress that zip file.
- Make sure your Python version is greater than 3.5 and your current working directory is the root path of this project in a terminal.
- Create a virtual environment for this project by typing this
virtualenv venv -p python3
in a terminal. - Activate the virtual environment by typing
.\venv\Scripts\activate
for Window's users orsource ./venv/bin/activate
for *nix's (Linux, MacOS, Unix, etc.) users. - Install all the required depencies by typing
pip install -r requirements.txt
in a terminal.
Go on typing python main.py
in the previous terminal.
Put all the PDF files into the folder pdfs you want to convert to the plain text.
See the LICENSE file.
Thank you for your attention.