/pdf2txt

Primary LanguagePythonApache License 2.0Apache-2.0

pdf2txt

提取PDF中的文字到纯文本文件中

Extract all the words in a PDF file into a plain text file with the same file name.

Download this code

Just use git to clone a copy of this repository. That is using git clone https://github.com/xiyusullos/pdf2txt.git in a terminal. Or press the Clone and download button and then press the Download ZIP button. Once downloaded, uncompress that zip file.

Install

  • Make sure your Python version is greater than 3.5 and your current working directory is the root path of this project in a terminal.
  • Create a virtual environment for this project by typing this virtualenv venv -p python3 in a terminal.
  • Activate the virtual environment by typing .\venv\Scripts\activate for Window's users or source ./venv/bin/activate for *nix's (Linux, MacOS, Unix, etc.) users.
  • Install all the required depencies by typing pip install -r requirements.txt in a terminal.

Run

Go on typing python main.py in the previous terminal.

Configuration

Put all the PDF files into the folder pdfs you want to convert to the plain text.

License

See the LICENSE file.

Thanks

Thank you for your attention.