npm i llama-ocr
import { ocr } from "llama-ocr";
const markdown = await ocr({
filePath: "./trader-joes-receipt.jpg", // path to your image (soon PDF!)
apiKey: process.env.TOGETHER_API_KEY, // Together AI API key
});
- Run the Code below:
docker build -t llama-ocr .
-
Start the container with the API key passed as an environment variable:
docker run -e TOGETHER_API_KEY=your_actual_key llama-ocr
Replace
your_actual_key
with your actual API key for Together AI. -
Test with a Mounted Volume (if needed): If your application requires files (e.g., images) from your local system, you can mount a volume to provide access:
docker run -e TOGETHER_API_KEY=your_actual_key -v $(pwd)/test:/usr/src/app/test llama-ocr
This mounts the
test
directory in your project to the container's/usr/src/app/test
directory.
The ocr
function in your src/index.ts
is the entry point. To verify it works, you need to invoke the ocr
function. If the application doesn't expose a web server or a CLI by default, you can test it directly by running the test/index.js
script.
-
Ensure the container runs the test script:
docker run -e TOGETHER_API_KEY=your_actual_key llama-ocr npm run test
-
If the script is set up correctly, you should see the Markdown output for the test image:
# Example Receipt - Item 1: $5.00 - Item 2: $10.00 Total: $15.00
If you need to debug or manually test the app inside the container:
-
Start an interactive shell in the container:
docker run -it -e TOGETHER_API_KEY=your_actual_key llama-ocr bash
-
Inside the container, you can run the test script or invoke the
ocr
function:npm run test
If you want to test locally before deploying the Docker container:
-
Run the
ocr
function directly fromtest/index.js
:node test/index.js
Ensure your
.env
file is set up with the API key or export the variable:export TOGETHER_API_KEY=your_actual_key
-
Expected output: The Markdown representation of the test image or file.
- Ensure the
TOGETHER_API_KEY
is valid. - Use a test image like
trader-joes-receipt.jpg
provided in thetest
directory or update the script to use a different image. - If errors occur, inspect the logs or debug by interacting with the container (
docker exec -it <container_id> bash
).
We have a hosted demo at LlamaOCR.com where you can try it out!
This library uses the free Llama 3.2 endpoint from Together AI to parse images and return markdown. Paid endpoints for Llama 3.2 11B and Llama 3.2 90B are also available for faster performance and higher rate limits.
You can control this with the model
option which is set to Llama-3.2-90B-Vision
by default but can also accept free
or Llama-3.2-11B-Vision
.
- Add support for local images OCR
- Add support for remote images OCR
- Add support for single page PDFs
- Add support for multi-page PDFs OCR (take screenshots of PDF & feed to vision model)
- Add support for JSON output in addition to markdown
This project was inspired by Zerox. Go check them out!