nsfw_data_source_urls

Collection of NSFW images URLs for the purposes of training an NSFW Image Classifier

  • 所有者: EBazarov/nsfw_data_source_urls
  • 平台:
  • 许可证: MIT License
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

NSFW data source URLs

Description

Repository contains lists of URLs that will help you download NSFW images, this set can be used in building big enough dataset to train robust NSFM classification model.

This work inspired by nsfw_data_scrapper and for downloading images suggested to use scripts from the scrapper.

Stats

In folder raw_data you will find different txt files each of them contains list of URLs, here some stats for this set:

  • 159 different categories
  • in total 1 589 331 URLs
  • after downloading and cleaning it's possible to have ~ 500GB or in other words ~ 1 300 000 of NSFW images, file name, number of URLs, --------------------------------------------------------------, ----------------, urls_age_college.txt, 2949, urls_age_mature.txt, 5942, urls_age_milf.txt, 8503, urls_age_teen.txt, 5389, urls_amateur.txt, 13033, urls_amateur_self-shots.txt, 10306, urls_appearance.txt, 2734, urls_appearance_appearance-modification.txt, 3795, urls_appearance_appearance-modification_piercings.txt, 1339, urls_appearance_appearance-modification_tattoos.txt, 1983, urls_appearance_clothing.txt, 24924, urls_appearance_clothing_bodyparts-through-clothes.txt, 6691, urls_appearance_clothing_bottomless.txt, 2390, urls_appearance_clothing_clothed-naked-pair.txt, 1274, urls_appearance_clothing_dresses.txt, 4360, urls_appearance_clothing_shoes.txt, 1238, urls_appearance_clothing_stockings.txt, 2556, urls_appearance_clothing_swimwear.txt, 741, urls_appearance_clothing_tight-clothing.txt, 11522, urls_appearance_clothing_topless.txt, 1009, urls_appearance_clothing_underwear.txt, 3190, urls_appearance_clothing_underwear_panties.txt, 9512, urls_appearance_clothing_underwear_thongs.txt, 2636, urls_appearance_clothing_uniforms-outfits.txt, 15390, urls_appearance_clothing_uniforms-outfits_cosplay.txt, 6465, urls_appearance_clothing_upskirt-downblouse.txt, 2599, urls_appearance_expressions.txt, 1396, urls_appearance_pose.txt, 8377, urls_appearance_wet-&-messy.txt, 9169, urls_artificial-images.txt, 247993, urls_artificial-images_fictional-characters-shows.txt, 73349, urls_artificial-images_hentai.txt, 81178, urls_artificial-images_photoshop.txt, 10146, urls_body-parts_head_hair.txt, 1797, urls_body-parts_head_hair_blonde.txt, 6227, urls_body-parts_head_hair_brunette.txt, 2022, urls_body-parts_head_hair_dyed.txt, 1011, urls_body-parts_head_hair_hairstyle.txt, 6946, urls_body-parts_head_hair_redhead.txt, 4725, urls_body-parts_head_lips-mouth.txt, 4449, urls_body-parts_lower-body.txt, 2136, urls_body-parts_lower-body_ass.txt, 9420, urls_body-parts_lower-body_ass_large.txt, 3654, urls_body-parts_lower-body_asshole.txt, 1826, urls_body-parts_lower-body_feet.txt, 3539, urls_body-parts_lower-body_gap.txt, 1332, urls_body-parts_lower-body_genitalia_penis.txt, 6611, urls_body-parts_lower-body_genitalia_penis_large.txt, 1607, urls_body-parts_lower-body_genitalia_penis_small.txt, 2233, urls_body-parts_lower-body_genitalia_vulva.txt, 12746, urls_body-parts_lower-body_genitalia_vulva_hair.txt, 12085, urls_body-parts_lower-body_genitalia_vulva_labia.txt, 5037, urls_body-parts_lower-body_hips.txt, 3490, urls_body-parts_lower-body_legs.txt, 3104, urls_body-parts_upper-body.txt, 4465, urls_body-parts_upper-body_breasts.txt, 11962, urls_body-parts_upper-body_breasts_from-an-angle.txt, 7196, urls_body-parts_upper-body_breasts_implants.txt, 3913, urls_body-parts_upper-body_breasts_large.txt, 11582, urls_body-parts_upper-body_breasts_nipples.txt, 4383, urls_body-parts_upper-body_breasts_small.txt, 3094, urls_body-traits_complexion_freckles.txt, 2309, urls_body-traits_complexion_light-skin.txt, 1436, urls_body-traits_complexion_tan.txt, 827, urls_body-traits_traits.txt, 157, urls_body-traits_traits_flexible.txt, 862, urls_body-traits_traits_pregnant.txt, 2674, urls_body-traits_types_bbw.txt, 8160, urls_body-traits_types_chubby.txt, 8207, urls_body-traits_types_curvy.txt, 1799, urls_body-traits_types_petite.txt, 2305, urls_body-traits_types_skinny-thin.txt, 4560, urls_classic-vintage.txt, 16532, urls_communities.txt, 12500, urls_communities_identification.txt, 1507, urls_communities_personals.txt, 1106, urls_communities_role-play.txt, 226, urls_cum-play_cum.txt, 4514, urls_cum-play_cum_creampie.txt, 1493, urls_cum-play_cum_cum-shot.txt, 4719, urls_cum-play_cum_cum-shot_bukkake.txt, 1042, urls_cum-play_cum_cum-shot_facial.txt, 2458, urls_cum-play_cum_swallowing.txt, 51, urls_cum-play_female.txt, 921, urls_ethnicity.txt, 19675, urls_ethnicity_asian.txt, 26674, urls_ethnicity_black.txt, 4220, urls_ethnicity_euro.txt, 3949, urls_ethnicity_indian.txt, 11195, urls_ethnicity_japanese.txt, 8109, urls_exhibition.txt, 10, urls_exhibition_gonewild.txt, 96718, urls_exhibition_public.txt, 15066, urls_fetish.txt, 22656, urls_fetish_bdsm.txt, 3301, urls_fetish_bdsm_bondage.txt, 8962, urls_fetish_bdsm_domination-&-submission.txt, 13608, urls_fetish_bdsm_domination-&-submission_femdom.txt, 9205, urls_fetish_drugs.txt, 1171, urls_fetish_role-enactment.txt, 942, urls_fetish_role-enactment_age-play.txt, 2053, urls_fetish_role-enactment_furry.txt, 2455, urls_fetish_role-enactment_pet-play.txt, 1270, urls_fetish_role-enactment_rape-abuse.txt, 1091, urls_fetish_watersports.txt, 5128, urls_general-categories.txt, 212869, urls_general-categories_artistic-or-borderline-porn.txt, 8944, urls_general-categories_desktop-wallpaper.txt, 20173, urls_general-categories_gifs.txt, 1228, urls_general-categories_humorous.txt, 1909, urls_general-categories_p.o.v..txt, 1025, urls_general-categories_passionate.txt, 781, urls_general-categories_porn-for-women.txt, 31, urls_general-categories_videos.txt, 400, urls_groups.txt, 97, urls_groups_alt.txt, 10321, urls_groups_athlete.txt, 7719, urls_groups_camgirl.txt, 4321, urls_groups_celebrity.txt, 46437, urls_groups_country.txt, 787, urls_groups_nerd.txt, 3742, urls_groups_pornstar.txt, 3860, urls_groups_pornstar_pornstar-lookalike.txt, urls_groups_religious.txt, 1054, urls_groups_specific-personality.txt, 4012, urls_illegal-taboo.txt, urls_illegal-taboo_bestiality.txt, urls_illegal-taboo_incest.txt, 3816, urls_illegal-taboo_voyeurism.txt, 439, urls_lgbt_bisexual.txt, 1244, urls_lgbt_crossdressing.txt, 2443, urls_lgbt_gay.txt, 19812, urls_lgbt_lesbian.txt, 5179, urls_lgbt_transgender.txt, 719, urls_lgbt_transsexual.txt, 13106, urls_literary.txt, 1953, urls_locations_man-made.txt, 3869, urls_locations_nature.txt, 3831, urls_locations_nature_beach.txt, 4698, urls_non-porn-nsfw.txt, 21389, urls_sex.txt, 1313, urls_sex_anal.txt, 4683, urls_sex_anal_gaping.txt, 754, urls_sex_anal_rimming.txt, 688, urls_sex_breasts.txt, 176, urls_sex_fisting.txt, 1033, urls_sex_group.txt, 1134, urls_sex_group_large-group.txt, 2989, urls_sex_group_swinging.txt, 4466, urls_sex_group_threesome.txt, 1747, urls_sex_insertion.txt, 4344, urls_sex_interracial.txt, 906, urls_sex_masturbation.txt, 2032, urls_sex_oral.txt, 4155, urls_sex_orgasm.txt, 327, urls_sex_toys.txt, 6710, urls_specific-actor-actress.txt, 52409, urls_specific-company.txt, 18763, urls_wtf.txt, 4001, ## NOTE
  1. After downloading is highly suggested to clean your dataset, for example:
    • delete duplicates
    • remove images that was banned/deleted (they have a special image placeholder)
    • find out corrupted data and remove it also
    • etc
  2. Pay attention to noise, some resources provide highly mixed data of NSFW and neutral images
  3. This repository helps in retrieving NSFW images and there's no special URLs for neutral content

主要指标

概览
名称与所有者EBazarov/nsfw_data_source_urls
主编程语言
编程语言 (语言数: 0)
平台
许可证MIT License
所有者活动
创建于2019-02-13 09:21:38
推送于2020-12-14 09:40:00
最后一次提交2019-02-22 09:36:01
发布数0
用户参与
星数3.4k
关注者数130
派生数740
提交数6
已启用问题?
问题数13
打开的问题数5
拉请求数1
打开的拉请求数1
关闭的拉请求数2
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?