Repository contains lists of URLs that will help you download NSFW images, this set can be used in building big enough dataset to train robust NSFM classification model.
This work inspired by nsfw_data_scrapper and for downloading images suggested to use scripts from the scrapper.
In folder raw_data
you will find different txt
files each of them contains list of URLs, here some stats for this set:
- 159 different categories
- in total 1 589 331 URLs
- after downloading and cleaning it's possible to have ~ 500GB or in other words ~ 1 300 000 of NSFW images
file name | number of URLs |
---|---|
urls_age_college.txt | 2949 |
urls_age_mature.txt | 5942 |
urls_age_milf.txt | 8503 |
urls_age_teen.txt | 5389 |
urls_amateur.txt | 13033 |
urls_amateur_self-shots.txt | 10306 |
urls_appearance.txt | 2734 |
urls_appearance_appearance-modification.txt | 3795 |
urls_appearance_appearance-modification_piercings.txt | 1339 |
urls_appearance_appearance-modification_tattoos.txt | 1983 |
urls_appearance_clothing.txt | 24924 |
urls_appearance_clothing_bodyparts-through-clothes.txt | 6691 |
urls_appearance_clothing_bottomless.txt | 2390 |
urls_appearance_clothing_clothed-naked-pair.txt | 1274 |
urls_appearance_clothing_dresses.txt | 4360 |
urls_appearance_clothing_shoes.txt | 1238 |
urls_appearance_clothing_stockings.txt | 2556 |
urls_appearance_clothing_swimwear.txt | 741 |
urls_appearance_clothing_tight-clothing.txt | 11522 |
urls_appearance_clothing_topless.txt | 1009 |
urls_appearance_clothing_underwear.txt | 3190 |
urls_appearance_clothing_underwear_panties.txt | 9512 |
urls_appearance_clothing_underwear_thongs.txt | 2636 |
urls_appearance_clothing_uniforms-outfits.txt | 15390 |
urls_appearance_clothing_uniforms-outfits_cosplay.txt | 6465 |
urls_appearance_clothing_upskirt-downblouse.txt | 2599 |
urls_appearance_expressions.txt | 1396 |
urls_appearance_pose.txt | 8377 |
urls_appearance_wet-&-messy.txt | 9169 |
urls_artificial-images.txt | 247993 |
urls_artificial-images_fictional-characters-shows.txt | 73349 |
urls_artificial-images_hentai.txt | 81178 |
urls_artificial-images_photoshop.txt | 10146 |
urls_body-parts_head_hair.txt | 1797 |
urls_body-parts_head_hair_blonde.txt | 6227 |
urls_body-parts_head_hair_brunette.txt | 2022 |
urls_body-parts_head_hair_dyed.txt | 1011 |
urls_body-parts_head_hair_hairstyle.txt | 6946 |
urls_body-parts_head_hair_redhead.txt | 4725 |
urls_body-parts_head_lips-mouth.txt | 4449 |
urls_body-parts_lower-body.txt | 2136 |
urls_body-parts_lower-body_ass.txt | 9420 |
urls_body-parts_lower-body_ass_large.txt | 3654 |
urls_body-parts_lower-body_asshole.txt | 1826 |
urls_body-parts_lower-body_feet.txt | 3539 |
urls_body-parts_lower-body_gap.txt | 1332 |
urls_body-parts_lower-body_genitalia_penis.txt | 6611 |
urls_body-parts_lower-body_genitalia_penis_large.txt | 1607 |
urls_body-parts_lower-body_genitalia_penis_small.txt | 2233 |
urls_body-parts_lower-body_genitalia_vulva.txt | 12746 |
urls_body-parts_lower-body_genitalia_vulva_hair.txt | 12085 |
urls_body-parts_lower-body_genitalia_vulva_labia.txt | 5037 |
urls_body-parts_lower-body_hips.txt | 3490 |
urls_body-parts_lower-body_legs.txt | 3104 |
urls_body-parts_upper-body.txt | 4465 |
urls_body-parts_upper-body_breasts.txt | 11962 |
urls_body-parts_upper-body_breasts_from-an-angle.txt | 7196 |
urls_body-parts_upper-body_breasts_implants.txt | 3913 |
urls_body-parts_upper-body_breasts_large.txt | 11582 |
urls_body-parts_upper-body_breasts_nipples.txt | 4383 |
urls_body-parts_upper-body_breasts_small.txt | 3094 |
urls_body-traits_complexion_freckles.txt | 2309 |
urls_body-traits_complexion_light-skin.txt | 1436 |
urls_body-traits_complexion_tan.txt | 827 |
urls_body-traits_traits.txt | 157 |
urls_body-traits_traits_flexible.txt | 862 |
urls_body-traits_traits_pregnant.txt | 2674 |
urls_body-traits_types_bbw.txt | 8160 |
urls_body-traits_types_chubby.txt | 8207 |
urls_body-traits_types_curvy.txt | 1799 |
urls_body-traits_types_petite.txt | 2305 |
urls_body-traits_types_skinny-thin.txt | 4560 |
urls_classic-vintage.txt | 16532 |
urls_communities.txt | 12500 |
urls_communities_identification.txt | 1507 |
urls_communities_personals.txt | 1106 |
urls_communities_role-play.txt | 226 |
urls_cum-play_cum.txt | 4514 |
urls_cum-play_cum_creampie.txt | 1493 |
urls_cum-play_cum_cum-shot.txt | 4719 |
urls_cum-play_cum_cum-shot_bukkake.txt | 1042 |
urls_cum-play_cum_cum-shot_facial.txt | 2458 |
urls_cum-play_cum_swallowing.txt | 51 |
urls_cum-play_female.txt | 921 |
urls_ethnicity.txt | 19675 |
urls_ethnicity_asian.txt | 26674 |
urls_ethnicity_black.txt | 4220 |
urls_ethnicity_euro.txt | 3949 |
urls_ethnicity_indian.txt | 11195 |
urls_ethnicity_japanese.txt | 8109 |
urls_exhibition.txt | 10 |
urls_exhibition_gonewild.txt | 96718 |
urls_exhibition_public.txt | 15066 |
urls_fetish.txt | 22656 |
urls_fetish_bdsm.txt | 3301 |
urls_fetish_bdsm_bondage.txt | 8962 |
urls_fetish_bdsm_domination-&-submission.txt | 13608 |
urls_fetish_bdsm_domination-&-submission_femdom.txt | 9205 |
urls_fetish_drugs.txt | 1171 |
urls_fetish_role-enactment.txt | 942 |
urls_fetish_role-enactment_age-play.txt | 2053 |
urls_fetish_role-enactment_furry.txt | 2455 |
urls_fetish_role-enactment_pet-play.txt | 1270 |
urls_fetish_role-enactment_rape-abuse.txt | 1091 |
urls_fetish_watersports.txt | 5128 |
urls_general-categories.txt | 212869 |
urls_general-categories_artistic-or-borderline-porn.txt | 8944 |
urls_general-categories_desktop-wallpaper.txt | 20173 |
urls_general-categories_gifs.txt | 1228 |
urls_general-categories_humorous.txt | 1909 |
urls_general-categories_p.o.v..txt | 1025 |
urls_general-categories_passionate.txt | 781 |
urls_general-categories_porn-for-women.txt | 31 |
urls_general-categories_videos.txt | 400 |
urls_groups.txt | 97 |
urls_groups_alt.txt | 10321 |
urls_groups_athlete.txt | 7719 |
urls_groups_camgirl.txt | 4321 |
urls_groups_celebrity.txt | 46437 |
urls_groups_country.txt | 787 |
urls_groups_nerd.txt | 3742 |
urls_groups_pornstar.txt | 3860 |
urls_groups_pornstar_pornstar-lookalike.txt | 0 |
urls_groups_religious.txt | 1054 |
urls_groups_specific-personality.txt | 4012 |
urls_illegal-taboo.txt | 0 |
urls_illegal-taboo_bestiality.txt | 0 |
urls_illegal-taboo_incest.txt | 3816 |
urls_illegal-taboo_voyeurism.txt | 439 |
urls_lgbt_bisexual.txt | 1244 |
urls_lgbt_crossdressing.txt | 2443 |
urls_lgbt_gay.txt | 19812 |
urls_lgbt_lesbian.txt | 5179 |
urls_lgbt_transgender.txt | 719 |
urls_lgbt_transsexual.txt | 13106 |
urls_literary.txt | 1953 |
urls_locations_man-made.txt | 3869 |
urls_locations_nature.txt | 3831 |
urls_locations_nature_beach.txt | 4698 |
urls_non-porn-nsfw.txt | 21389 |
urls_sex.txt | 1313 |
urls_sex_anal.txt | 4683 |
urls_sex_anal_gaping.txt | 754 |
urls_sex_anal_rimming.txt | 688 |
urls_sex_breasts.txt | 176 |
urls_sex_fisting.txt | 1033 |
urls_sex_group.txt | 1134 |
urls_sex_group_large-group.txt | 2989 |
urls_sex_group_swinging.txt | 4466 |
urls_sex_group_threesome.txt | 1747 |
urls_sex_insertion.txt | 4344 |
urls_sex_interracial.txt | 906 |
urls_sex_masturbation.txt | 2032 |
urls_sex_oral.txt | 4155 |
urls_sex_orgasm.txt | 327 |
urls_sex_toys.txt | 6710 |
urls_specific-actor-actress.txt | 52409 |
urls_specific-company.txt | 18763 |
urls_wtf.txt | 4001 |
- After downloading is highly suggested to clean your dataset, for example:
- delete duplicates
- remove images that was banned/deleted (they have a special image placeholder)
- find out corrupted data and remove it also
- etc
- Pay attention to noise, some resources provide highly mixed data of NSFW and neutral images
- This repository helps in retrieving NSFW images and there's no special URLs for neutral content