使用danbooru2018数据集的实用程序
danbooru-utilit的Python项目详细描述
丹布鲁公用事业公司
danbooru实用程序是一个简单的python脚本,用于处理gwern的danbooru2018数据集。它可以浏览数据集、按标签过滤、分级和评分、检测人脸和调整图像大小我一直用它来制作gan训练的数据集
安装
pip3 install danbooru-utility
确保已下载Danbooru2018。这是约3.3米的动画图像注释,所以下载可能需要很长时间。
用法
首先让我们寻找一些相当特殊的东西。
$ danbooru-utility \ --directory ~/datasets/danbooru-gwern/danbooru2018/ \ --rating "s"\ --required_tags "archer,toosaka_rin,hug"\ --max_examples 3\ --img_size 256 Processed 3 files. Added 3 images. It took 14.39 sec
这将找到三个带有所需标记的图像,并将其大小调整为256x256。注意这需要很长时间,因为过滤只是在一个循环中完成的。让我们检查一下在out-images
中产生了什么:
现在让我们运行相同的命令,但是使用面部检测:
$ danbooru-utility \ --directory ~/datasets/danbooru-gwern/danbooru2018/ \ --rating "s"\ --required_tags "archer,toosaka_rin,hug"\ --max_examples 3\ --img_size 256\ --faces Processed 3 files. Added 1 images. It took 12.48 sec
产生了:
所以它在图像的上中心位置裁剪了脸。
让我们更改face_scale
参数这将控制面周围的图像有多少包含在裁剪中。
$ danbooru-utility \ --directory ~/datasets/danbooru-gwern/danbooru2018/ \ --rating "s"\ --required_tags "archer,toosaka_rin,hug"\ --max_examples 3\ --img_size 256\ --faces \ --overwrite \ --face_scale 1.8 Processed 3 files. Added 1 images. It took 12.49 sec
有点紧
如果您已经处理了一些图像,则此实用程序将检查这些图像,而不复制它们,除非您使用--overwrite
因此,如果更改图像生成参数,则应使用此标志您还可以指定符号链接到--link_dir
例如,您可以调整大量图像的大小,然后快速为特定标记创建数据集
所以对于GAN训练,我会用这样的方法生成一个训练集:
$ danbooru-utility \ --directory ~/datasets/danbooru-gwern/danbooru2018/ \ --rating "s,q"\ --banned_tags "photo,comic"\ --max_examples 1000000000\ --img_size 256\ --faces Processed 100 files. It took 10.36 sec Processed 200 files. It took 20.06 sec Processed 300 files. It took 39.16 sec ...
配置
有关参数的详细信息,请查看帮助。
$ danbooru-utility -h
usage: danbooru-utility [-h] [-d DIRECTORY] [--metadata_dir METADATA_DIR]
[--save_dir SAVE_DIR] [--link_dir LINK_DIR]
[-r REQUIRED_TAGS] [-b BANNED_TAGS] [-a ATLEAST_TAGS]
[--ratings RATINGS] [--score_range SCORE_RANGE]
[-n ATLEAST_NUM] [--overwrite [OVERWRITE]]
[--preview [PREVIEW]] [--faces [FACES]]
[--face_scale FACE_SCALE]
[--max_examples MAX_EXAMPLES] [--img_size IMG_SIZE]
danbooru2018 utility script
optional arguments:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Danbooru dataset directory.
--metadata_dir METADATA_DIR
Metadata path below base directory. Will load all json
files here.
--save_dir SAVE_DIR Directory processed images are saved to.
--link_dir LINK_DIR Directory with already processed images. Used to
symlink to if the files exist.
-r REQUIRED_TAGS, --required_tags REQUIRED_TAGS
Tags required.
-b BANNED_TAGS, --banned_tags BANNED_TAGS
Tags disallowed.
-a ATLEAST_TAGS, --atleast_tags ATLEAST_TAGS
Requires some number of these tags.
--ratings RATINGS Only include images with these ratings. "s,q,e" are
the possible entries, and represent
"safe,questionable,explicit".
--score_range SCORE_RANGE
Only include images inside this score range.
-n ATLEAST_NUM, --atleast_num ATLEAST_NUM
Minimum number of atleast_tags required.
--overwrite [OVERWRITE]
Overwrite images in save directory.
--preview [PREVIEW] Preview images.
--faces [FACES] Detect faces and try to include them in top of image.
--face_scale FACE_SCALE
Height and width multiplier over size of face.
--max_examples MAX_EXAMPLES
Maximum number of files to load.
--img_size IMG_SIZE Size of side for resized images.
以下是Danbooru2018中的元数据条目示例:
{'approver_id':'0','created_at':'2016-10-26 09:32:42.38506 UTC','down_score':'0','favs':['12082','334419','496852','516035','487870'],'file_ext':'jpg','file_size':'753165','has_children':False,'id':'2524919','image_height':'874','image_width':'1181','is_banned':False,'is_deleted':False,'is_flagged':False,'is_note_locked':False,'is_pending':False,'is_rating_locked':False,'is_status_locked':False,'last_commented_at':'1970-01-01 00:00:00 UTC','last_noted_at':'1970-01-01 00:00:00 UTC','md5':'a9260780fbf5cfd661878f92a268124e','parent_id':'2524918','pixiv_id':'54348754','pools':[],'rating':'s','score':'3','source':'http://i3.pixiv.net/img-original/img/2015/12/31/13/31/23/54348754_p13.jpg','tags':[{'category':'0','id':'540830','name':'1boy'},{'category':'0','id':'470575','name':'1girl'},{'category':'1','id':'1332557','name':'akira_(ubw)'},{'category':'4','id':'396','name':'archer'},{'category':'0','id':'13200','name':'black_hair'},{'category':'0','id':'3389','name':'blush'},{'category':'0','id':'4563','name':'bow'},{'category':'0','id':'465619','name':'closed_eyes'},{'category':'0','id':'71730','name':'dark_skin'},{'category':'0','id':'610236','name':'dark_skinned_male'},{'category':'3','id':'5','name':'fate/stay_night'},{'category':'3','id':'662939','name':'fate_(series)'},{'category':'0','id':'374938','name':'frown'},{'category':'0','id':'374844','name':'hair_bow'},{'category':'0','id':'5126','name':'hug'},{'category':'0','id':'1815','name':'smile'},{'category':'0','id':'125238','name':'sweatdrop'},{'category':'4','id':'400140','name':'toosaka_rin'},{'category':'0','id':'652604','name':'two_side_up'},{'category':'0','id':'16581','name':'white_hair'}],'up_score':'3','updated_at':'2018-06-05 05:37:49.87865 UTC','uploader_id':'39276'}
您可以使用--preview
探索元数据并找到与每个图像关联的标记。
改进
这可以将数据集加载到关系数据库中,从而允许更高效和强大的查询
人脸检测还有改进的余地。它有罕见的假阳性和相当数量的假阴性。
我很乐意考虑拉取请求。
致谢
感谢gwern提供了优秀的danbooru数据集
多亏了动漫人脸检测模型的nagadomi。