FOR MORE INFO ABOUT THE OLD API (DECRYPTR LEGACY), SEE V0.1
decryptr
is an R package to break captchas. It is also an extensible tool built in a way that enables anyone to contribute with their own captcha-breaking code.
To install decryptr
, simply run the code below:
if (!require(devtools)) install.packages("devtools")
devtools::install_github("decryptr/decryptr")
decryptr
has functions for downloading and breaking captchas from multiple known sources. If you wanted to use this package with, let's say, a TRT (Regional Worker's Court), you could go by the following steps:
# Download captcha from TRT
file <- download_captcha("trt", path = "./img")
# Break captcha
decrypt(file, model = "trt")
## [1] "yrt67d"
Simple, right? The decrypt()
funcion is this package's workhorse: it is able to take a captcha (either the path to a captcha file or a captcha object read with read_captcha()
) and break it with a model (either the name of a known model, the path to a model file or a model object created with train_model()
).
If you'd like to visualize a captcha and make sure the decryption is working, you can use the plot()
funcion to draw out the captcha image:
# Read captcha
captcha <- read_captcha(file)
# Plot captcha
plot(captcha)
If you want to learn more about the models that already come packaged with decryptr
, check out load_model()
's documentation (and all of these models also have a corresponding download_captcha()
method so you're always good to go).
If you're willing to create your own custom captcha-breaking models, there are some other functions you might want to know about. classify()
allows the user to manually answer a list of captchas, while train_model()
takes a bunch of classified captchas and trains a keras
model on them.
classify()
has two modes: static and interactive. If you already know the answers to all captchas, simply turn them into a string vector and pass it onto the answers
argument; on the other hand, if you're going to manually classify the captchas, classify()
will plot every captcha and prompt you in the console for their answers. In the snippet below, I use static classification to label a set of 10 captchas:
# URL of a captcha (for illustrative purposes I'll be using
# TRT's URL, but you can use whichever URL you want)
url <- "https://pje.trt4.jus.br/consultaprocessual/seam/resource/captcha"
# Download captcha from URL
files <- download_captcha(url, n = 10, path = "./img")
# Answers to downloaded captchas
answers <- c(
"ew3h3n", "ew3h3n", "da7522", "da7522", "w8kerh",
"mh52v3", "ny248u", "56nwr5", "7tx6dy", "n3fue6")
# Classify captchas (if answers weren't supplied,
# I'd be promped for interactive classification)
new_files <- classify(files, answers, path = "./img")
Now that we have a set of classified captchas, we can use them to train a captcha-breaking model. classify()
used our answers to create a new version of each file, one with the answer at the end of the filename separated by an underscore; read_captcha()
has the ans_in_path
argument that tells it to look for the answers in the filenames and create the captcha objects accordingly.
With this list of labeled captcha objects, we can call train_model()
to generate a model. The model gets automatically saved to disk so that we can load it later with load_model()
.
# Read answered captchas
captchas <- read_captcha(new_files, ans_in_path = TRUE)
# Use captchas to train a model
model <- train_model(captchas, verbose = FALSE)
# Use our new model for decryption
decrypt(file, model = model)
## [1] "jre652"
# We could also have loaded the model from disk
model <- load_model("./model.hdf5")
Once loaded to memory, keras
models run very quickly Also, we don't run any pre-processing on the image, so decryption is blazing fast.
microbenchmark::microbenchmark(decrypt = decrypt(captcha, model))
## Unit: milliseconds
## expr min lq mean median uq max neval
## decrypt 8.333295 9.256788 10.96834 9.713691 10.47587 106.873 100