OCR integration

Question

OCR integration

sdaqo opened this issue 9 months ago · 7 comments

Hey, first of all thanks a lot for developing this app, it is awesome!
This is a feature request for ocr in this app, this would be a pretty nice feature e.g. for reading physical manga or just for characters on physical stuff that you can't just easily copy-paste.
I would love to integrate this on my own but unfortunately do not have to necessary android dev knowledge to do so. I did however find a library for doing this: https://developers.google.com/ml-kit/vision/text-recognition/v2 this is from google's ml-kit, which is used in things like google lens, it runs on device and it seems like it has an relatively easy API to integrate.
Tell me what you think about this!

Answer 1 · 2024-04-01T18:56:08.000Z

Thanks for your feature request. I'm glad you like the app.
While I don't see this feature as strictly necessary since there are separate apps that can recognize text in an image and let you copy it, I think it would be very convenient and a good addition to have it in this app if implementing it won't require too much work (which it probably won't if I use something like ML Kit like you suggested). Just to clarify, what you want is to be able to take a picture (or open an image file), then select part of the recognized text and immediately search for it? Do you have any suggestions on what you think it should look like?

Answer 2 · 2024-04-01T19:28:14.000Z

Yeah, so essentially you would have a camera button in the search bar or some floating icon in the bottom corner, where you can start the OCR.
I do not know how annoying it would be to implement this but maybe (could also be future thing, don't want to burden you 😅) you could afterwards have the results as a component that slides in from the bottom (probably seen that before on other apps), with this you could search several characters on the same image and you could probably reuse the existing results screen/component. Don't bother if it's too annoying and just implement it as you suggested i.e. select text on picture and start a normal search.

Answer 3 · 2024-04-05T00:39:34.000Z

have the results as a component that slides in from the bottom (probably seen that before on other apps), with this you could search several characters on the same image

Could you elaborate on this a bit? I haven't seen it in other apps before, so I don't know exactly what that looks like. Also, by "component that slides in from the bottom", are you referring to a bottom sheet?

Answer 4 · 2024-04-05T05:01:18.000Z

Yeah exactly

Answer 5 · 2024-04-11T18:27:37.000Z

Finally had time to look into this. I have decided to reject this feature request. These are my reasons:

I don't think ML Kit is currently usable enough for Japanese text. It very regularly misrecognizes kanji (and sometimes even kana) even on clear and in-focus pictures of non-handwritten text. It also particularly struggles with vertically written text.
The ML Kit libraries are proprietary and their inclusion in this app is incompatible with its GPL-3.0 license. Their inclusion would also exclude the app from F-Droid, which only accepts apps that are completely open-source.
I considered using a different OCR library that is open-source called tesseract. However, it is completely unusable for Japanese text from my testing, as it often doesn't get even a single character right from the picture it's given, even in ideal conditions.
For both ML Kit and tesseract, the training data required would more than double the download size of the app, which I would like to avoid.

I suggest using a separate app for OCR like I've mentioned previously. I'm planning on making it possible to share text to JS-Dict from other apps, which should make that workflow a bit less tedious.

Answer 6 · 2024-04-11T20:42:37.000Z

I see, this makes sense, thank you for considering it! (the text sharing would be awesome!)

Answer 7 · 2024-07-17T10:52:57.000Z

@sdaqo did you check this out, I think this fork implements your requested feature: http://github.com/3nws/JS-Dict

Maybe @petlyh would be able to pull the changes to main repo? As the tesseract seems to work well enough for computer text as seen in the screenshots in the above linked repo. Maybe it can be an optional feature that downloads the necessary assets if the user wishes to use those features, keeping the app size to a minimal

Thank you!!