ReadBook

Read Book is an iOS application which helps us to recognise text from an image and even you can play that text and copy it for further use. I have used Vision for Text Recognition and AVFoundation for speech.

And when you select any image and click on convert, you will get the text like this and you can even play that text as shown below:

Important Codes:

For recognising the text from any image first you have to import vison and convert the image into cgImage type. Then make a handler of VNImageRequestHandler type and a request of VNRecognizeTextRequest type. After that assign observation to request and extract text from the observation and ask handler to handle the request. You can even add some request property that how text should be extracted like, recognitionLanguage, recognitionLevel, etc.

  import Vision
  
  func requestText() {
        guard let cgImage = self.recievedImage?.cgImage else { return }
        let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
        var request = VNRecognizeTextRequest(completionHandler: nil)
        var text = ""

        request = VNRecognizeTextRequest(completionHandler: {(request, error) in
            guard let observations = request.results as? [VNRecognizedTextObservation] else { fatalError("Invalid ovservation")}

            for observation in observations {
                guard let topCandidate = observation.topCandidates(1).first else {
                    print("Not candidate")
                    continue
                }
                text += "\n\(topCandidate.string)"
            }
            DispatchQueue.main.async {
                self.imageTextView.text = text
            }
        })

        request.customWords = ["custOm"]
        request.minimumTextHeight = 0.03125
        request.recognitionLevel = .accurate
        request.recognitionLanguages = ["en_US"]
        request.usesLanguageCorrection = true

        let requests = [request]

        DispatchQueue.global(qos: .userInitiated).async {
            try?handler.perform(requests)
        }
    }

For speech we use AVFoundation and its AVSpeechSynthesizer and also in this you can set the speech property as you like. For ex: voice, rate, etc.

  import AVFoundation
  
  let synthesizer = AVSpeechSynthesizer()
  
  func requestSound(text: String) {
        let utterance = AVSpeechUtterance(string: text)
        utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")
        utterance.rate = 0.5

        synthesizer.speak(utterance)
    }

We can even recognise text using CoreML and GoogleMLKit/TextRecognition

OCR

Basically Text Recognition is a part of OCR. OCR stands for Optical character recognition or optical character reader. OCR Will scan the document or image file and then converting the text into a machine-readable.

let me break process one by one and explain you

Image Acquisition

In this process, an Image/ document will be scanned and replace each pixel in an image with a black or a white pixel Example Image:

Pre-processing

Areas outside the text will be removed Example Image:

After Pre-processing that black and white image we will get like the above image.

Segmentation

Just look at the 22 it was like joined with one and other , So in this process OCR will segmenting these type

Feature Extraction:

In this process each and every character will be Recognize and convert as machine-readable text
OCR have many font will compare and convert it
There are many Approach, will show some two

Approach #1

Will scan by single, single character and compare with functions

Approach #2

In this Approach will take line by line (Like Human eyes reading )and will convert it

Like this there are many Approach, Its based on what tech we need

Post-Processing

Computer also do some mistake (OCR make some spelling mistake while recognition), So here will try to correct it.

So in iOS we use Vision for the OCR Process

Thank You, Happy Learning!

Dr-Groot/ReadBook

ReadBook

Important Codes:

OCR