This is a Flask web application that allows users to upload PDF files and convert them to DOCX format. The application supports scanned PDF files and can extract text from them, including text in Vietnamese.
- Upload PDF files and convert them to DOCX format.
- Support for scanned PDF files using PyMuPDF for text extraction.
- Full support for Vietnamese text extraction.
- Easy to use web interface.
-
Clone the repository:
```bash git clone https://github.com:tathiyennhi/pdf-to-docx-converter-python.git cd pdf-to-docx-converter ```
s
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Start the Flask application:
python app.py
-
Open your web browser and go to
http://127.0.0.1:5000
. -
Upload a PDF file and click on the "Convert" button.
-
The converted DOCX file will be downloaded automatically.
- Flask==2.1.1
- werkzeug==2.1.1
- python-docx==0.8.11
- PyMuPDF==1.19.6
pdf-to-docx-converter/ │ ├── app.py ├── requirements.txt ├── uploads/ │ └── templates/ └── index.html
- Make sure the
uploads
directory is writable. - The application deletes the uploaded PDF file after conversion to save space.
This project is licensed under the MIT License.