False positives when used on a dating app

Question

False positives when used on a dating app

Opened this issue 10 days ago · 0 comments

Thanks for the model! I'm using it on a small dating app which currently has 365,359 user-uploaded images. For my use case, I've noticed a lot of high scores for SFW images. For example, I've attached 5 images from the top 10 most NSFW images on @duolicious as predicted by private-detector. The model says that these have about a 79% chance of being NSFW despite them being fairly benign. I haven't cherry-picked these images; None of the other 5 images contain nudity either, though one of the other images is suggestive.

Although I haven't done any formal benchmarks, the accuracy of the model seems good for 50-50 splits of NSFW/SFW images. Though on data sets where the images are predominantly SFW, the proportion of false positives is quite high. I'd be interested to know how Bumble deals with this problem. Do you ensemble private-detector with other models? Do you use the model as a first-pass and leave the final decision up to a human moderator?

Interestingly, I've found that private-detector tends to give high NSFW probabilities to animals, especially cats. That makes me wonder how the training data for this model was obtained. Presumably a significant portion of training images come from in-app reports of users who sent cat pictures to make double entendres. I wonder if the accuracy of the model could be improved with cleaner training data.

Edit: It's possible that I haven't encoded the data properly before passing it to the model. Although, like I said, I've found that the model gives good results for 50-50 splits of lewd and inoffensive images, so if the implementation I'm using is wrong, it can't be that wrong. The predictions are also highly invariant to manipulations of the input images, especially rotation. If you'd like to see my implementation to reproduce my results, it can be found here. Although it's part of a bigger application, everything in the antiporn/ directory can be run separately. The predict_nsfw function is the entry point. It takes the raw image data (as read from disk) as a list of BytesIO objects.