/ReCAPTCHAWiz

ML techniques to answer ReCAPTCHA challenges

Primary LanguageJupyter Notebook

ReCAPTCHAWiz: Using Selenium and a CNN to Defeat ReCAPTCHA Mountain Image Challenges

ReCAPTCHAWiz combines Selenium with a trained convolutional neural network to classify images that contain mountains or not in order to pass Google ReCAPTCHA challenges. See live example of Google's ReCAPTCHA here and below image.

ReCAPTCHA Challenge Screenshot

Image Data

The images used to train the CNN were obtained directly from the ReCAPTCHA site. The split_recaptcha_source_images notebook was used to go through each downloaded image (which contain 9 sub-images), split them and use MD5 checksum to get rid of duplicates. The images were then manually sorted into two folders in preparation for going through the CNN.

Convolutional Neural Network

The CNN used for this project leverages a Keras tutorial for classifying cat and dog pictures. Using a pre-trained CNN provides very high accuracy on the validation set (~98%). The weights can be found in the model_weights folder. The CNN is run with Mountain_classifier_VGG.py.

Selenium

ReCAPTCHA uses Javascript so Selenium is used to interact with the program. ReCAPTCHAWiz.py opens the ReCAPTCHA demo in Selenium, and automatically selects the appropriate images (for mountain challenges).

Miscellaneous

This folder contains a few Jupyter notebooks which were used to obtain images of mountains from other sources, such as Flickr (using the API) and Imagenet. This folder also contains ReCAPTCHAWiz versions running on fully-trained CNNs.