The note about the original paper: SSD: Single Shot MultiBox Detector can be found here.
This practice is inspired by ssd-plate_detection
The detail of the above code can read my blog:, which was written in chinese.
Meanwhile, I have uploaded my training caffemodel to BaiduYun, Google Drive, Dropbox.
Some examples of the scene text detection:
Currently, I mainly focus on image/video captioning.