/pdf-to-excel

Repo for pdf to excel project

Primary LanguagePythonApache License 2.0Apache-2.0

pdf-to-excel

Repo for pdf to excel project

Installation

!pip install pdf2image
!apt-get install poppler-utils
!pip install paddlepaddle-gpu==2.3.0.post110 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
!pip install "paddleocr>=2.0.1"
!pip install protobuf==3.20.0
!git clone https://github.com/PaddlePaddle/PaddleOCR.git
!wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
!pip install -U layoutparser-0.0.0-py3-none-any.whl

Usage

!python demo.py --input_path '.../path to input directory' --output_path '.../path to output directory'