A Java Rest API built with Tess4j for text extraction from images and pdf files.
- TIFF, JPEG, GIF, PNG, and BMP image formats
- Multi-page TIFF images
- PDF document format
- Java 17
- Spring Boot 3
- Tess4j
- Tesseract-ocr
- Maven
- Docker
To run the project on a unix operating system you need
- docker
- docker-compose
- Maven
- Java 17
- permissions for docker to run without sudo
POST /v1/jeto/extract
curl --request POST \
--url http://localhost:3000/v1/jeto/extract \
--header 'Content-Type: multipart/form-data' \
--form file=file.pdf
HTTP/1.1 200
Content-Type: text/plain;charset=UTF-8
Content-Length: 3145
Date: Sat, 24 Jun 2023 21:00:56 GMT
Jorge Melgarejo