中文说明
Online Demo
- Note this may not work sometimes due to stupid google gcp kept restarting my instance. In that case you can wait for me to restart the service, which may take up to 24 hrs.
- Note this online demo is using the current main branch version.
Changelogs
2022-01-24
- Added text detection model by dmMaze
2021-08-21
- New MST based text region merge algorithm, huge text region merge improvement
- Add baidu translator in demo mode
- Add google translator in demo mode
- Various bugfixes
2021-07-29
- Web demo adds translator, detection resolution and target language option
- Slight text color extraction improvement
2021-07-26
Major upgrades for all components, now we are on beta!
Note in this version all English texts are detected as capital letters,
You need Python >= 3.8 for cached_property
to work
- Detection model upgrade
- OCR model upgrade, better at text color extraction
- Inpainting model upgrade
- Major text rendering improvement, faster rendering and higher quality text with shadow
- Slight mask generation improvement
- Various bugfixes
- Default detection resolution has been dialed back to 1536 from 2048
2021-07-09
- Fix erroneous image rendering when inpainting is not used
2021-06-18
- Support manual translation
- Support detection and rendering of angled texts
2021-06-13
- Text mask completion is now based on CRF, mask quality is drastically improved
2021-06-10
- Improve text rendering
2021-06-09
- New text region based text direction detection method
- Support running demo as web service
2021-05-20
- Text detection model is now based on DBNet with ResNet34 backbone
- OCR model is now trained with more English sentences
- Inpaint model is now based on AOT which requires far less memory
- Default inpainting resolution is now increased to 2048, thanks to the new inpainting model
- Support merging hyphenated English words
2021-05-11
- Add youdao translate and set as default translator
2021-05-06
- Text detection model is now based on DBNet with ResNet101 backbone
- OCR model is now deeper
- Default detection resolution has been increased to 2048 from 1536
Note this version is slightly better at handling English texts, other than that it is worse in every other ways
2021-03-04
- Added inpainting model
2021-02-17
- First version launched
Translate texts in manga/images
Some manga/images will never be translated, therefore this project is born,
Primarily designed for translating Japanese text, but also support Chinese and English
Support inpainting and text rendering
Successor to https://github.com/PatchyVideo/MMDOCR-HighPerformance
How to use
- Python>=3.8
- Clone this repo
- Download
ocr.ckpt
,detect.ckpt
,comictextdetector.pt
,comictextdetector.pt.onnx
andinpainting.ckpt
, put them in the root directory of this repo - [Optional if using Google translate] Apply for youdao or deepl translate API, put your APP_KEY and APP_SECRET or AUTH_KEY in
translators/key.py
- Run
python translate_demo.py --image <path_to_image_file> [--use-inpainting] [--verbose] [--use-cuda] [--translator=google] [--target-lang=CHS]
, result can be found inresult/
. Add--use-inpainting
to enable inpainting, Add--use-cuda
to use CUDA.
Language codes
Used by --target-lang
argument
"CHS": "Chinese (Simplified)",
"CHT": "Chinese (Traditional)",
"CSY": "Czech",
"NLD": "Dutch",
"ENG": "English",
"FRA": "French",
"DEU": "German",
"HUN": "Hungarian",
"ITA": "Italian",
"JPN": "Japanese",
"KOR": "Korean",
"PLK": "Polish",
"PTB": "Portuguese (Brazil)",
"ROM": "Romanian",
"RUS": "Russian",
"ESP": "Spanish",
"TRK": "Turkish",
"VIN": "Vietnamese"
How to use (batch translation)
- Python>=3.8
- Clone this repo
- Download
ocr.ckpt
,detect.ckpt
,comictextdetector.pt
,comictextdetector.pt.onnx
andinpainting.ckpt
, put them in the root directory of this repo - [Optional if using Google translate] Apply for youdao or deepl translate API, put your APP_KEY and APP_SECRET or AUTH_KEY in
translators/key.py
- Run
python translate_demo.py --mode batch --image <path_to_image_folder> [--use-inpainting] [--verbose] [--use-cuda] [--translator=google] [--target-lang=CHS]
, result can be found in<path_to_image_folder>-translated/
. Add--use-inpainting
to enable inpainting, Add--use-cuda
to use CUDA.
How to use
- Python>=3.8
- Clone this repo
- Download
ocr.ckpt
,detect.ckpt
,comictextdetector.pt
,comictextdetector.pt.onnx
andinpainting.ckpt
, put them in the root directory of this repo - [Optional if using Google translate] Apply for youdao or deepl translate API, put your APP_KEY and APP_SECRET or AUTH_KEY in
translators/key.py
- Run
python translate_demo.py --mode web [--use-inpainting] [--verbose] [--use-cuda] [--translator=google] [--target-lang=CHS]
, the demo will be serving on http://127.0.0.1:5003
Two modes of translation service are provided by the demo: synchronous mode and asynchronous mode
In synchronous mode your HTTP POST request will finish once the translation task is finished.
In asynchronous mode your HTTP POST request will respond with a task_id immediately, you can use this task_id to poll for translation task state.
Synchronous mode
- POST a form request with form data
file:<content-of-image>
to http://127.0.0.1:5003/run - Wait for response
- Use the resultant task_id to find translation result in
result/
directory, e.g. using Nginx to exposeresult/
Asynchronous mode
- POST a form request with form data
file:<content-of-image>
to http://127.0.0.1:5003/submit - Acquire translation task_id
- Poll for translation task state by posting JSON
{"taskid": <task-id>}
to http://127.0.0.1:5003/task-state - Translation is finished when the resultant state is either
finished
,error
orerror-lang
- Find translation result in
result/
directory, e.g. using Nginx to exposeresult/
Manual translation
Manual translation replace machine translation with human translators
- POST a form request with form data
file:<content-of-image>
to http://127.0.0.1:5003/manual-translate - Wait for response
- You will obtain a JSON response like this:
{
"task_id": "12c779c9431f954971cae720eb104499",
"status": "pending",
"trans_result": [
{
"s": "☆上司来ちゃった……",
"t": ""
}
]
}
- Fill in translated texts
{
"task_id": "12c779c9431f954971cae720eb104499",
"status": "pending",
"trans_result": [
{
"s": "☆上司来ちゃった……",
"t": "☆Boss is here..."
}
]
}
- Post translated JSON to http://127.0.0.1:5003/post-translation-result
- Wait for response
- Find translation result in
result/
directory, e.g. using Nginx to exposeresult/
This is a hobby project, you are welcome to contribute
Currently this only a simple demo, many imperfections exist, we need your support to make this project better!
Next steps
What need to be done
- Inpainting is based onAggregated Contextual Transformations for High-Resolution Image Inpainting
- IMPORTANT!!!HELP NEEDED!!! The current text rendering engine is barely usable, we need your help to improve text rendering!
- Text rendering area is determined by detected text lines, not speech bubbles. This works for images without speech bubbles, but making it impossible to decide where to put translated English text. I have no idea how to solve this.
- Ryota et al. proposed using multimodal machine translation, maybe we can add ViT features for building custom NMT models.
- Make this project works for video(rewrite code in C++ and use GPU/other hardware NN accelerator). Used for detecting hard subtitles in videos, generting ass file and remove them completetly.
Mask refinement based using non deep learning algorithms, I am currently testing out CRF based algorithm.Angled text region merge is not currently supported
Samples
The following samples are from the original version, they do not represent the current main branch version.
Original | Translated |
---|---|