台鐵自動訂票+驗證碼辨識
This is an automation railway booking system using selenium webdriver and captcha crack with keras. The result of actual online testing is about 91% accuracy.
- python3 (3.5.2)
- numpy (1.13.1)
- pandas (0.22.0)
- keras (2.1.5)
- tensorflow (1.8.0)
- selenium (3.13.0)
https://github.com/j40903272/railway_automation_booking
Selenium supports many webdrivers and here we use Chrome webdriver. Download the version that matches your OS and browser version into this directory.
Run the following command to generate 131072 self-made captcha images which is very similar to the real ones. The generated captcha images would be in data/captcha. The labels of these captcha would be in data/captcha/label.csv.
python3 gen_captcha.py
Here we implement a simple CNN model with keras for captcha image classification. You can try different models like ctc loss or many others. A same single CNN model to classify both 5 and 6 digits can only achieve 70% accuracy in online testing. Therefore, we have to first classify whether it is 5 or 6 digits and then do the text recognition. This requires three models cnn_split.ipynb, cnn_5.ipynb, cnn_6.ipynb
Booking demonstrations in book.ipynb