[PYTHON] Face detection by collecting images of Angers.

If you don't know Vtuber or Nijisanji, you won't get here, but I wanted to do machine learning so that I could identify "sanbaka", so I first downloaded the image of Angers and detected the face. There are various reference sites, but they are based on the following sites.

https://qiita.com/Tatsuro64/items/95b0ce48b6bb155bfe29

Program: pytyon3.7 Environment: ubuntu18.04 Library used: BeautifulSoup (scraping), opencv (face detection), urllib (image download)

The main processing of the code is as follows.

if __name__ == '__main__':

    downloads = ImageDownloader('Ange Katrina').go()
    for i, d in enumerate(downloads):
        FaceDetector(d).cutout_faces('image/faces/faces_{}.jpg'.format(i + 1))

DL with the downloader and face detect the list, that's it.

The downloader is below.

class ImageDownloader(object):
    def __init__(self, keyword):
        session = requests.session()
        session.headers.update(
            {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) \
             Gecko/20100101 Firefox/10.0'})
        params = urllib.parse.urlencode(
            {'q': keyword, 'tbm': 'isch', 'ijn': '1'})
        query = "https://www.google.co.jp/search" + '?' + params
        self._bs = BeautifulSoup(session.get(query).text, 'html.parser')

    def go(self):
        downloads = []
        for img in self._bs.find_all('img'):
            try:
                url = img['data-iurl']
                downloads.append('image/image_{}.jpg'.format(len(downloads) + 1))
                urllib.request.urlretrieve(url, downloads[-1])
            except KeyError:
                print('failed to get url : {}'.format(img))
        return downloads

■ Constructor I issued a query to Google (keyword: Angers) and put the search result page (only the first page this time) on the scraping library (Beautiful Soup).

■ go method All the img tags are searched, and the link is stored in the attribute ['data-iurl'], so download it with the urllib module. Some tags that don't have the ['data-iurl'] attribute, probably other than the search image, are found, so it catches the KeyError exception and passes it through.

The face cutter is as follows.

class FaceDetector(object):

    #Trained model
    FACE_CASCADE = '/home/websoler/anaconda3/lib/python3.7/site-packages/cv2/data/lbpcascade_animeface.xml'

    def __init__(self, fname):
        self._img = cv2.imread(fname)

    def cutout_faces(self, fname):
        gray = cv2.cvtColor(self._img, cv2.COLOR_BGR2GRAY)
        classifier = cv2.CascadeClassifier(FaceDetector.FACE_CASCADE)
        faces = classifier.detectMultiScale(gray, scaleFactor=1.2, minSize=(30, 30))
        if len(faces):
            for (x, y, w, h) in faces:
                region = self._img[y:y + h, x:x + w]
                region_resized = cv2.resize(region, (128, 128))
                cv2.imwrite(fname, region_resized)
                break  #TODO For the time being, only the first case.

■ Constructor I have opencv read the passed file.

■ cut_faces method When the face is detected, I cut it, resize it to 128 x 128, and save it as a file.

The learning model is not the standard model provided by OpenCV, but uses lbpcascade_animeface.xml, which is specialized for anime faces. It didn't work when I put it locally, so I put it directly in the python library path. In my environment, anaconda was in it before I knew it, so below that.

The anime model has been downloaded from the following. https://github.com/nagadomi/lbpcascade_animeface

I specified minSize because the smaller the area, the higher the false positive rate. Somehow Angers' sleeves and fingers are called "faces".

Next time, I will prepare the TF Records format for machine learning and TensorFlow.

The entire code is below.


from bs4 import BeautifulSoup
import cv2
import os
import requests
import shutil
import urllib


#Environmental arrangement
shutil.rmtree('image')
os.mkdir('image')
os.mkdir('image/faces')


class ImageDownloader(object):
    def __init__(self, keyword):
        session = requests.session()
        session.headers.update(
            {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) \
             Gecko/20100101 Firefox/10.0'})
        params = urllib.parse.urlencode(
            {'q': keyword, 'tbm': 'isch', 'ijn': '1'})
        query = "https://www.google.co.jp/search" + '?' + params
        self._bs = BeautifulSoup(session.get(query).text, 'html.parser')

    def go(self):
        downloads = []
        for img in self._bs.find_all('img'):
            try:
                url = img['data-iurl']
                downloads.append('image/image_{}.jpg'.format(len(downloads) + 1))
                urllib.request.urlretrieve(url, downloads[-1])
            except KeyError:
                print('failed to get url : {}'.format(img))
        return downloads


class FaceDetector(object):

    #Trained model
    FACE_CASCADE = '/home/websoler/anaconda3/lib/python3.7/site-packages/cv2/data/lbpcascade_animeface.xml'

    def __init__(self, fname):
        self._img = cv2.imread(fname)

    def cutout_faces(self, fname):
        gray = cv2.cvtColor(self._img, cv2.COLOR_BGR2GRAY)
        classifier = cv2.CascadeClassifier(FaceDetector.FACE_CASCADE)
        faces = classifier.detectMultiScale(gray, scaleFactor=1.2, minSize=(30, 30))
        if len(faces):
            for (x, y, w, h) in faces:
                region = self._img[y:y + h, x:x + w]
                region_resized = cv2.resize(region, (128, 128))
                cv2.imwrite(fname, region_resized)
                break  #TODO For the time being, only the first case.

if __name__ == '__main__':

    downloads = ImageDownloader('Ange Katrina').go()
    for i, d in enumerate(downloads):
        FaceDetector(d).cutout_faces('image/faces/faces_{}.jpg'.format(i + 1))