Support Localization of Words via `tsv` or `hocr` flag
seveibar opened this issue · 2 comments
seveibar commented
seveibar commented
This is an example of a script I wrote to parse the tsv generated from tesseract
import { exec } from "child_process"
import tempy from "tempy"
import Papa from "papaparse"
import fs from "fs/promises"
const outputFilePath = tempy.file({ extension: "tsv" })
await new Promise((resolve, reject) => {
exec(
`tesseract --psm 12 --oem 2 -l chi_tra ${inputFilePath} ${outputFilePath.replace(
/\.tsv$/,
""
)} tsv`,
(err, stdout, stderr) => {
if (err) reject(err)
resolve(null)
}
)
})
const recognizedChars: Array<RChar> = Papa.parse(
(await fs.readFile(outputFilePath)).toString(),
{
header: true,
}
).data.map((a) => ({
...a,
block_num: parseInt(a.block_num),
left: parseInt(a.left),
top: parseInt(a.top),
width: parseInt(a.width),
height: parseInt(a.height),
}))
ToWelie89 commented
This is an example of a script I wrote to parse the tsv generated from
tesseract
import { exec } from "child_process" import tempy from "tempy" import Papa from "papaparse" import fs from "fs/promises" const outputFilePath = tempy.file({ extension: "tsv" }) await new Promise((resolve, reject) => { exec( `tesseract --psm 12 --oem 2 -l chi_tra ${inputFilePath} ${outputFilePath.replace( /\.tsv$/, "" )} tsv`, (err, stdout, stderr) => { if (err) reject(err) resolve(null) } ) }) const recognizedChars: Array<RChar> = Papa.parse( (await fs.readFile(outputFilePath)).toString(), { header: true, } ).data.map((a) => ({ ...a, block_num: parseInt(a.block_num), left: parseInt(a.left), top: parseInt(a.top), width: parseInt(a.width), height: parseInt(a.height), }))
Thank you so nuch, exactly what I was looking for!