If you don't know Vtuber or Nijisanji, you won't get here, but I wanted to do machine learning so that I could identify "sanbaka", so I first downloaded the image of Angers and detected the face. There are various reference sites, but they are based on the following sites.
https://qiita.com/Tatsuro64/items/95b0ce48b6bb155bfe29
Program: pytyon3.7 Environment: ubuntu18.04 Library used: BeautifulSoup (scraping), opencv (face detection), urllib (image download)
The main processing of the code is as follows.
if __name__ == '__main__':
downloads = ImageDownloader('Ange Katrina').go()
for i, d in enumerate(downloads):
FaceDetector(d).cutout_faces('image/faces/faces_{}.jpg'.format(i + 1))
DL with the downloader and face detect the list, that's it.
The downloader is below.
class ImageDownloader(object):
def __init__(self, keyword):
session = requests.session()
session.headers.update(
{'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) \
Gecko/20100101 Firefox/10.0'})
params = urllib.parse.urlencode(
{'q': keyword, 'tbm': 'isch', 'ijn': '1'})
query = "https://www.google.co.jp/search" + '?' + params
self._bs = BeautifulSoup(session.get(query).text, 'html.parser')
def go(self):
downloads = []
for img in self._bs.find_all('img'):
try:
url = img['data-iurl']
downloads.append('image/image_{}.jpg'.format(len(downloads) + 1))
urllib.request.urlretrieve(url, downloads[-1])
except KeyError:
print('failed to get url : {}'.format(img))
return downloads
■ Constructor I issued a query to Google (keyword: Angers) and put the search result page (only the first page this time) on the scraping library (Beautiful Soup).
■ go method All the img tags are searched, and the link is stored in the attribute ['data-iurl'], so download it with the urllib module. Some tags that don't have the ['data-iurl'] attribute, probably other than the search image, are found, so it catches the KeyError exception and passes it through.
The face cutter is as follows.
class FaceDetector(object):
#Trained model
FACE_CASCADE = '/home/websoler/anaconda3/lib/python3.7/site-packages/cv2/data/lbpcascade_animeface.xml'
def __init__(self, fname):
self._img = cv2.imread(fname)
def cutout_faces(self, fname):
gray = cv2.cvtColor(self._img, cv2.COLOR_BGR2GRAY)
classifier = cv2.CascadeClassifier(FaceDetector.FACE_CASCADE)
faces = classifier.detectMultiScale(gray, scaleFactor=1.2, minSize=(30, 30))
if len(faces):
for (x, y, w, h) in faces:
region = self._img[y:y + h, x:x + w]
region_resized = cv2.resize(region, (128, 128))
cv2.imwrite(fname, region_resized)
break #TODO For the time being, only the first case.
■ Constructor I have opencv read the passed file.
■ cut_faces method When the face is detected, I cut it, resize it to 128 x 128, and save it as a file.
The learning model is not the standard model provided by OpenCV, but uses lbpcascade_animeface.xml, which is specialized for anime faces. It didn't work when I put it locally, so I put it directly in the python library path. In my environment, anaconda was in it before I knew it, so below that.
The anime model has been downloaded from the following. https://github.com/nagadomi/lbpcascade_animeface
I specified minSize because the smaller the area, the higher the false positive rate. Somehow Angers' sleeves and fingers are called "faces".
Next time, I will prepare the TF Records format for machine learning and TensorFlow.
The entire code is below.
from bs4 import BeautifulSoup
import cv2
import os
import requests
import shutil
import urllib
#Environmental arrangement
shutil.rmtree('image')
os.mkdir('image')
os.mkdir('image/faces')
class ImageDownloader(object):
def __init__(self, keyword):
session = requests.session()
session.headers.update(
{'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) \
Gecko/20100101 Firefox/10.0'})
params = urllib.parse.urlencode(
{'q': keyword, 'tbm': 'isch', 'ijn': '1'})
query = "https://www.google.co.jp/search" + '?' + params
self._bs = BeautifulSoup(session.get(query).text, 'html.parser')
def go(self):
downloads = []
for img in self._bs.find_all('img'):
try:
url = img['data-iurl']
downloads.append('image/image_{}.jpg'.format(len(downloads) + 1))
urllib.request.urlretrieve(url, downloads[-1])
except KeyError:
print('failed to get url : {}'.format(img))
return downloads
class FaceDetector(object):
#Trained model
FACE_CASCADE = '/home/websoler/anaconda3/lib/python3.7/site-packages/cv2/data/lbpcascade_animeface.xml'
def __init__(self, fname):
self._img = cv2.imread(fname)
def cutout_faces(self, fname):
gray = cv2.cvtColor(self._img, cv2.COLOR_BGR2GRAY)
classifier = cv2.CascadeClassifier(FaceDetector.FACE_CASCADE)
faces = classifier.detectMultiScale(gray, scaleFactor=1.2, minSize=(30, 30))
if len(faces):
for (x, y, w, h) in faces:
region = self._img[y:y + h, x:x + w]
region_resized = cv2.resize(region, (128, 128))
cv2.imwrite(fname, region_resized)
break #TODO For the time being, only the first case.
if __name__ == '__main__':
downloads = ImageDownloader('Ange Katrina').go()
for i, d in enumerate(downloads):
FaceDetector(d).cutout_faces('image/faces/faces_{}.jpg'.format(i + 1))
Recommended Posts