how can i set options with language
ibeeger opened this issue · 9 comments
i want use language chi_sim
where can i set options
All current options are in the readme.
If foreign languages aren't supported by the extractors natively, I'd have to see if they somehow provide support.
Multi language support is outside my expertise. Are there specific options on the underlying extractors you are looking to manipulate?
I use tesseract, it contains parameter settings like
tesseract demo.jpg res -l chi_sim;
textract(type, filePath, config, function( error, text ) {})
I want to know those type config settings
ok, specifically for tesseract, I can look into allow those configuration parms to pass through.
3q
Any chance you could provide an image and then the expected text from that image? Something for me to test with?
I added language support for tesseract.
One thing I had to do to support languages was to update a cleaning regex that I have that is responsible for stripping "non-text". I added \u4E00-\u9FFF
to the regex to keep Chinese characters. I did that based on this post on Stack Overflow.
I obviously do not know Chinese. Is it worth adding other ranges? Please let me know.
Released with v0.13.1
If you add all of that may have to do a lot of work,I would first try
Think your response got cut off