nsfw_data_source_urls

Collection of NSFW images URLs for the purposes of training an NSFW Image Classifier

  • 所有者: EBazarov/nsfw_data_source_urls
  • 平台:
  • 許可證: MIT License
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

NSFW data source URLs

Description

Repository contains lists of URLs that will help you download NSFW images, this set can be used in building big enough dataset to train robust NSFM classification model.

This work inspired by nsfw_data_scrapper and for downloading images suggested to use scripts from the scrapper.

Stats

In folder raw_data you will find different txt files each of them contains list of URLs, here some stats for this set:

  • 159 different categories
  • in total 1 589 331 URLs
  • after downloading and cleaning it's possible to have ~ 500GB or in other words ~ 1 300 000 of NSFW images, file name, number of URLs, --------------------------------------------------------------, ----------------, urls_age_college.txt, 2949, urls_age_mature.txt, 5942, urls_age_milf.txt, 8503, urls_age_teen.txt, 5389, urls_amateur.txt, 13033, urls_amateur_self-shots.txt, 10306, urls_appearance.txt, 2734, urls_appearance_appearance-modification.txt, 3795, urls_appearance_appearance-modification_piercings.txt, 1339, urls_appearance_appearance-modification_tattoos.txt, 1983, urls_appearance_clothing.txt, 24924, urls_appearance_clothing_bodyparts-through-clothes.txt, 6691, urls_appearance_clothing_bottomless.txt, 2390, urls_appearance_clothing_clothed-naked-pair.txt, 1274, urls_appearance_clothing_dresses.txt, 4360, urls_appearance_clothing_shoes.txt, 1238, urls_appearance_clothing_stockings.txt, 2556, urls_appearance_clothing_swimwear.txt, 741, urls_appearance_clothing_tight-clothing.txt, 11522, urls_appearance_clothing_topless.txt, 1009, urls_appearance_clothing_underwear.txt, 3190, urls_appearance_clothing_underwear_panties.txt, 9512, urls_appearance_clothing_underwear_thongs.txt, 2636, urls_appearance_clothing_uniforms-outfits.txt, 15390, urls_appearance_clothing_uniforms-outfits_cosplay.txt, 6465, urls_appearance_clothing_upskirt-downblouse.txt, 2599, urls_appearance_expressions.txt, 1396, urls_appearance_pose.txt, 8377, urls_appearance_wet-&-messy.txt, 9169, urls_artificial-images.txt, 247993, urls_artificial-images_fictional-characters-shows.txt, 73349, urls_artificial-images_hentai.txt, 81178, urls_artificial-images_photoshop.txt, 10146, urls_body-parts_head_hair.txt, 1797, urls_body-parts_head_hair_blonde.txt, 6227, urls_body-parts_head_hair_brunette.txt, 2022, urls_body-parts_head_hair_dyed.txt, 1011, urls_body-parts_head_hair_hairstyle.txt, 6946, urls_body-parts_head_hair_redhead.txt, 4725, urls_body-parts_head_lips-mouth.txt, 4449, urls_body-parts_lower-body.txt, 2136, urls_body-parts_lower-body_ass.txt, 9420, urls_body-parts_lower-body_ass_large.txt, 3654, urls_body-parts_lower-body_asshole.txt, 1826, urls_body-parts_lower-body_feet.txt, 3539, urls_body-parts_lower-body_gap.txt, 1332, urls_body-parts_lower-body_genitalia_penis.txt, 6611, urls_body-parts_lower-body_genitalia_penis_large.txt, 1607, urls_body-parts_lower-body_genitalia_penis_small.txt, 2233, urls_body-parts_lower-body_genitalia_vulva.txt, 12746, urls_body-parts_lower-body_genitalia_vulva_hair.txt, 12085, urls_body-parts_lower-body_genitalia_vulva_labia.txt, 5037, urls_body-parts_lower-body_hips.txt, 3490, urls_body-parts_lower-body_legs.txt, 3104, urls_body-parts_upper-body.txt, 4465, urls_body-parts_upper-body_breasts.txt, 11962, urls_body-parts_upper-body_breasts_from-an-angle.txt, 7196, urls_body-parts_upper-body_breasts_implants.txt, 3913, urls_body-parts_upper-body_breasts_large.txt, 11582, urls_body-parts_upper-body_breasts_nipples.txt, 4383, urls_body-parts_upper-body_breasts_small.txt, 3094, urls_body-traits_complexion_freckles.txt, 2309, urls_body-traits_complexion_light-skin.txt, 1436, urls_body-traits_complexion_tan.txt, 827, urls_body-traits_traits.txt, 157, urls_body-traits_traits_flexible.txt, 862, urls_body-traits_traits_pregnant.txt, 2674, urls_body-traits_types_bbw.txt, 8160, urls_body-traits_types_chubby.txt, 8207, urls_body-traits_types_curvy.txt, 1799, urls_body-traits_types_petite.txt, 2305, urls_body-traits_types_skinny-thin.txt, 4560, urls_classic-vintage.txt, 16532, urls_communities.txt, 12500, urls_communities_identification.txt, 1507, urls_communities_personals.txt, 1106, urls_communities_role-play.txt, 226, urls_cum-play_cum.txt, 4514, urls_cum-play_cum_creampie.txt, 1493, urls_cum-play_cum_cum-shot.txt, 4719, urls_cum-play_cum_cum-shot_bukkake.txt, 1042, urls_cum-play_cum_cum-shot_facial.txt, 2458, urls_cum-play_cum_swallowing.txt, 51, urls_cum-play_female.txt, 921, urls_ethnicity.txt, 19675, urls_ethnicity_asian.txt, 26674, urls_ethnicity_black.txt, 4220, urls_ethnicity_euro.txt, 3949, urls_ethnicity_indian.txt, 11195, urls_ethnicity_japanese.txt, 8109, urls_exhibition.txt, 10, urls_exhibition_gonewild.txt, 96718, urls_exhibition_public.txt, 15066, urls_fetish.txt, 22656, urls_fetish_bdsm.txt, 3301, urls_fetish_bdsm_bondage.txt, 8962, urls_fetish_bdsm_domination-&-submission.txt, 13608, urls_fetish_bdsm_domination-&-submission_femdom.txt, 9205, urls_fetish_drugs.txt, 1171, urls_fetish_role-enactment.txt, 942, urls_fetish_role-enactment_age-play.txt, 2053, urls_fetish_role-enactment_furry.txt, 2455, urls_fetish_role-enactment_pet-play.txt, 1270, urls_fetish_role-enactment_rape-abuse.txt, 1091, urls_fetish_watersports.txt, 5128, urls_general-categories.txt, 212869, urls_general-categories_artistic-or-borderline-porn.txt, 8944, urls_general-categories_desktop-wallpaper.txt, 20173, urls_general-categories_gifs.txt, 1228, urls_general-categories_humorous.txt, 1909, urls_general-categories_p.o.v..txt, 1025, urls_general-categories_passionate.txt, 781, urls_general-categories_porn-for-women.txt, 31, urls_general-categories_videos.txt, 400, urls_groups.txt, 97, urls_groups_alt.txt, 10321, urls_groups_athlete.txt, 7719, urls_groups_camgirl.txt, 4321, urls_groups_celebrity.txt, 46437, urls_groups_country.txt, 787, urls_groups_nerd.txt, 3742, urls_groups_pornstar.txt, 3860, urls_groups_pornstar_pornstar-lookalike.txt, urls_groups_religious.txt, 1054, urls_groups_specific-personality.txt, 4012, urls_illegal-taboo.txt, urls_illegal-taboo_bestiality.txt, urls_illegal-taboo_incest.txt, 3816, urls_illegal-taboo_voyeurism.txt, 439, urls_lgbt_bisexual.txt, 1244, urls_lgbt_crossdressing.txt, 2443, urls_lgbt_gay.txt, 19812, urls_lgbt_lesbian.txt, 5179, urls_lgbt_transgender.txt, 719, urls_lgbt_transsexual.txt, 13106, urls_literary.txt, 1953, urls_locations_man-made.txt, 3869, urls_locations_nature.txt, 3831, urls_locations_nature_beach.txt, 4698, urls_non-porn-nsfw.txt, 21389, urls_sex.txt, 1313, urls_sex_anal.txt, 4683, urls_sex_anal_gaping.txt, 754, urls_sex_anal_rimming.txt, 688, urls_sex_breasts.txt, 176, urls_sex_fisting.txt, 1033, urls_sex_group.txt, 1134, urls_sex_group_large-group.txt, 2989, urls_sex_group_swinging.txt, 4466, urls_sex_group_threesome.txt, 1747, urls_sex_insertion.txt, 4344, urls_sex_interracial.txt, 906, urls_sex_masturbation.txt, 2032, urls_sex_oral.txt, 4155, urls_sex_orgasm.txt, 327, urls_sex_toys.txt, 6710, urls_specific-actor-actress.txt, 52409, urls_specific-company.txt, 18763, urls_wtf.txt, 4001, ## NOTE
  1. After downloading is highly suggested to clean your dataset, for example:
    • delete duplicates
    • remove images that was banned/deleted (they have a special image placeholder)
    • find out corrupted data and remove it also
    • etc
  2. Pay attention to noise, some resources provide highly mixed data of NSFW and neutral images
  3. This repository helps in retrieving NSFW images and there's no special URLs for neutral content

主要指標

概覽
名稱與所有者EBazarov/nsfw_data_source_urls
主編程語言
編程語言 (語言數: 0)
平台
許可證MIT License
所有者活动
創建於2019-02-13 09:21:38
推送於2020-12-14 09:40:00
最后一次提交2019-02-22 09:36:01
發布數0
用户参与
星數3.4k
關注者數130
派生數740
提交數6
已啟用問題?
問題數13
打開的問題數5
拉請求數1
打開的拉請求數1
關閉的拉請求數2
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?