Automatic receipt scanning

Question

Automatic receipt scanning

rgov opened this issue 8 months ago · 20 comments

To make entering purchases easier, a model (like donut-base-finetuned-cord-v2, demo) could be wired up to provide automatic receipt scanning.

Depending on how big of a model it is, you might be able to run inference entirely client side with ONNX Runtime Web or similar.

Answer 1 · 2024-01-18T16:35:26.000Z

+1, would like some sort of OCR based receipt scanning/splitting feature similiar to splitwise. Great app!

Could use the AI approach (which might be expensive) or alternatively just direct OCR -

Python - https://pyimagesearch.com/2021/10/27/automatically-ocring-receipts-and-scans/

Answer 2 · 2024-01-19T21:38:03.000Z

Yes would love to have something like this! Or even an openai api key field that allows user to just use openAIs vision api.

Answer 3 · 2024-01-19T21:46:35.000Z

That would be an interesting feature, but I really lack skills in all the AI stuff. Marking the issue with help wanted label 😉

Answer 4 · 2024-01-19T21:51:04.000Z

Yes would love to have something like this! Or even an openai api key field that allows user to just use openAIs vision api.

This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

There must be a zillion Node.js OpenAI API libraries to make it easy.

Answer 5 · 2024-01-19T21:58:30.000Z

Yeah all you would need is an image(s) input in the each expense form. Or a bulk receipt upload and get structured data back from openai of just the "company" and " "total amount" "currency" and the user can later go through all the transactions once they're processed. Would be a great on-the-go feature. I typically have to dedicated sometime after my trips to sit and go through all my transactions.

I can help with designs, but I'm not a strong developer.

p.s. Once again super thankful for this app @scastiel

This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

Answer 6 · 2024-01-20T00:53:18.000Z

I think AI would be overkill - here's a nodejs API library, which would take any image and parse it out with lineitem name and cost.

Nodejs - https://developers.mindee.com/docs/nodejs-receipt-ocr


Explain
const mindee = require("mindee");
// for TS or modules:
// import * as mindee from "mindee";

// Init a new client
const mindeeClient = new mindee.Client({ apiKey: "my-api-key" });

// Load a file from disk
const inputSource = mindeeClient.docFromPath("/path/to/the/file.ext");

// Parse the file
const apiResponse = mindeeClient.parse(
  mindee.product.ReceiptV5,
  inputSource
);

// Handle the response Promise
apiResponse.then((resp) => {
  // print a string summary
  console.log(resp.document.toString());
});

Answer 7 · 2024-01-20T04:35:47.000Z

There appears to be a JavaScript library that might help here?
https://github.com/naptha/tesseract.js/

Answer 8 · 2024-01-20T05:03:29.000Z

@adiso06: I think AI would be overkill - here's a nodejs API library, which would take any image and parse it out with lineitem name and cost. Nodejs - https://developers.mindee.com/docs/nodejs-receipt-ocr

Interesting, it looks like what we’re looking for. Note that it costs $0.10/page after 250 pages/month). Not necessarily a problem, but it might become a paid feature in the future on Spliit.app (and an opt-in feature with bring your own API key if self-hosted).

@rgov: This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

Might be an option (a cheaper one) as well.

@manuerwin: There appears to be a JavaScript library that might help here? https://github.com/naptha/tesseract.js/

Tesseract does the OCR, but extracting information from the read content remains, and might be the most complex part 😉

Answer 9 · 2024-01-20T05:27:15.000Z

@rgov: This is probably the easiest path forward. Here's some API information and here's the cost calculator -- an image would cost something like 1¢ to process.

Might be an option (a cheaper one) as well.

I still think openai API key input is the easiest and cheapest way for us to make it available. Down the road we can monetize this if we chose to go this path for people who just want to get this working by paying and are not tech savvy.

Processing a 1000x1000 image with openAI vision will cost $0.00765. And the data can be structured based on how you want it returned to you. This also opens up new doors to extract other information in the future.
Some other examples we can ask AI todo is:

What is the category of the transaction?
Get Google Location ID and coordinates of the transaction? (if we ever wanna place it on a map)
What currency is this transaction in?

Answer 10 · 2024-01-29T22:46:07.000Z

In #69 I implemented a first version using OpenAI. It seems to work pretty well and costs ~$0.01-0.02/receipt.

If some of you have an OpenAI API access (with GPT-4 with Vision), I’d really appreciate some additional tests and feedback.

As I said in the PR, note that I’d really like to focus on making the feature work for now. Later we’ll think more about improving user experience 😉.

Also it’s my first time with OpenAI API and I’m really not an expert with AI, so open to feedback about the implementation here, like the prompt 😅

Screen.Recording.2024-01-29.at.17.45.11.mov

Answer 11 · 2024-01-29T23:31:01.000Z

Lettss gooo!! This is amazing!! 💯 I have vision API access. Is there anywhere I can test this thats live?

Answer 12 · 2024-01-29T23:37:55.000Z

Lettss gooo!! This is amazing!! 💯 I have vision API access. Is there anywhere I can test this thats live?

For now the only way is to run the application locally I’m afraid.

Answer 13 · 2024-01-30T02:54:33.000Z

Is it the receipt-scan branch? I managed to open up the project locally via docker. But can't find the receipt button anywhere.

Answer 14 · 2024-01-30T03:02:02.000Z

Is it the receipt-scan branch? I managed to open up the project locally via docker. But can't find the receipt button anywhere.

You need to define two environment variables (in container.env if running with Docker):

NEXT_PUBLIC_ENABLE_RECEIPT_EXTRACT=true
OPENAI_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXX

Answer 15 · 2024-01-30T03:17:27.000Z

Ahh sweet got it! Updated that, I see the button now.

However getting some errors when uploading file:

"Something wrong happened when uploading the document. Please retry later or select a different file."

Answer 16 · 2024-01-30T03:30:50.000Z

Forgot to mention you need to enable expense receipts as well: https://github.com/spliit-app/spliit?tab=readme-ov-file#expense-documents (which reminds me that receipt scanning depends on this feature, and the README should mention it).

Edit: actually receipt scanning doesn’t have to depend on expense documents. Although it would make more sense in a production application, it is possible at least for dev to scan receipts without storing them on S3. I’ll work on it.

Answer 17 · 2024-01-30T04:09:40.000Z

Haha thats my bad, I should've read the readme better.

I think I'm getting some permission issues now with AWS. I don't want to be a burden with this either. I can wait until this is on prod/staging to test out the feature.

Just a note so far from what I see - is we'd probably need an input box for storing openai api key somewhere, I'd also assume it would have to be stored locally (in a cookie?) since there are no user accounts.

Answer 18 · 2024-01-30T22:11:23.000Z

Alright, the feature is merged 🎉

I added a dialog to make it more clear how it works. Feel free to test at https://spliit.app and give your feedback 😉

Screen.Recording.2024-01-30.at.17.10.01.mov

A few remarks:

For now, I pay for OpenAI calls on Spliit.app. There is a hard limit in monthly costs; I don’t expect it to be reached unless thousands of people use the feature. I may put in place per-group premium features in the future.
If you’re self-hosting, you need to enable S3 document upload if you enabled the receipt scanning. It should be possible to enable only receipt scanning (reading the image, generating a data-URL, etc.) but I didn’t think it was necessary for now.

Answer 19 · 2024-01-30T23:50:51.000Z

@scastiel Amazing as always!! Works like a charm, and images that dont have any information simply get attached which is pretty great!! The per-group premium features is def better than having the need for every single person to subscribe.

Thanks again for the speedy turn around on this 😊 🥳

Answer 20 · 2024-01-31T21:52:41.000Z

A huge thanks to everyone who participated here! This is because of this collaboration that I love building Spliit as an open source project ❤️

I wrote a short blog post about the feature: Announcing Receipt Scanning Using AI. And so I added a blog to Spliit.app too 😉. Feel free to share it with your community!