bumble-tech/private-detector

False positives when used on a dating app

Opened this issue · 0 comments

Thanks for the model! I'm using it on a small dating app which currently has 365,359 user-uploaded images. For my use case, I've noticed a lot of high scores for SFW images. For example, I've attached 5 images from the top 10 most NSFW images on @duolicious as predicted by private-detector. The model says that these have about a 79% chance of being NSFW despite them being fairly benign. I haven't cherry-picked these images; None of the other 5 images contain nudity either, though one of the other images is suggestive.

Although I haven't done any formal benchmarks, the accuracy of the model seems good for 50-50 splits of NSFW/SFW images. Though on data sets where the images are predominantly SFW, the proportion of false positives is quite high. I'd be interested to know how Bumble deals with this problem. Do you ensemble private-detector with other models? Do you use the model as a first-pass and leave the final decision up to a human moderator?

Interestingly, I've found that private-detector tends to give high NSFW probabilities to animals, especially cats. That makes me wonder how the training data for this model was obtained. Presumably a significant portion of training images come from in-app reports of users who sent cat pictures to make double entendres. I wonder if the accuracy of the model could be improved with cleaner training data.

Edit: It's possible that I haven't encoded the data properly before passing it to the model. Although, like I said, I've found that the model gives good results for 50-50 splits of lewd and inoffensive images, so if the implementation I'm using is wrong, it can't be that wrong. The predictions are also highly invariant to manipulations of the input images, especially rotation. If you'd like to see my implementation to reproduce my results, it can be found here. Although it's part of a bigger application, everything in the antiporn/ directory can be run separately. The predict_nsfw function is the entry point. It takes the raw image data (as read from disk) as a list of BytesIO objects.


450-1d7b7114d1c0ff3de40f34738ea6e08bd7ca877942386eac7c444a2791d21f6f
450-6cda4a49ffe28e4b956330c9ff2808788d17274dbbf42847e8c154b83274c7ba
450-ba0fcd92ed25ddf71f0054f4164e01312ae70095ffe0dc4b30e72ceefac12234
450-04b7e40216ae5397913f9561e7e41a8de7e548f2543856682f5cca19dd1081e2
450-a2a9f12065cc30e13062cd042adc0a6282f5bcdf239b0e2389fa658248e74ccd