[PYTHON] Face detection by collecting images of Angers.

If you don't know Vtuber or Nijisanji, you won't get here, but I wanted to do machine learning so that I could identify "sanbaka", so I first downloaded the image of Angers and detected the face. There are various reference sites, but they are based on the following sites.

https://qiita.com/Tatsuro64/items/95b0ce48b6bb155bfe29

Program: pytyon3.7 Environment: ubuntu18.04 Library used: BeautifulSoup (scraping), opencv (face detection), urllib (image download)

The main processing of the code is as follows.

if __name__ == '__main__':

    downloads = ImageDownloader('Ange Katrina').go()
    for i, d in enumerate(downloads):
        FaceDetector(d).cutout_faces('image/faces/faces_{}.jpg'.format(i + 1))

DL with the downloader and face detect the list, that's it.

The downloader is below.

class ImageDownloader(object):
    def __init__(self, keyword):
        session = requests.session()
        session.headers.update(
            {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) \
             Gecko/20100101 Firefox/10.0'})
        params = urllib.parse.urlencode(
            {'q': keyword, 'tbm': 'isch', 'ijn': '1'})
        query = "https://www.google.co.jp/search" + '?' + params
        self._bs = BeautifulSoup(session.get(query).text, 'html.parser')

    def go(self):
        downloads = []
        for img in self._bs.find_all('img'):
            try:
                url = img['data-iurl']
                downloads.append('image/image_{}.jpg'.format(len(downloads) + 1))
                urllib.request.urlretrieve(url, downloads[-1])
            except KeyError:
                print('failed to get url : {}'.format(img))
        return downloads

■ Constructor I issued a query to Google (keyword: Angers) and put the search result page (only the first page this time) on the scraping library (Beautiful Soup).

■ go method All the img tags are searched, and the link is stored in the attribute ['data-iurl'], so download it with the urllib module. Some tags that don't have the ['data-iurl'] attribute, probably other than the search image, are found, so it catches the KeyError exception and passes it through.

The face cutter is as follows.

class FaceDetector(object):

    #Trained model
    FACE_CASCADE = '/home/websoler/anaconda3/lib/python3.7/site-packages/cv2/data/lbpcascade_animeface.xml'

    def __init__(self, fname):
        self._img = cv2.imread(fname)

    def cutout_faces(self, fname):
        gray = cv2.cvtColor(self._img, cv2.COLOR_BGR2GRAY)
        classifier = cv2.CascadeClassifier(FaceDetector.FACE_CASCADE)
        faces = classifier.detectMultiScale(gray, scaleFactor=1.2, minSize=(30, 30))
        if len(faces):
            for (x, y, w, h) in faces:
                region = self._img[y:y + h, x:x + w]
                region_resized = cv2.resize(region, (128, 128))
                cv2.imwrite(fname, region_resized)
                break  #TODO For the time being, only the first case.

■ Constructor I have opencv read the passed file.

■ cut_faces method When the face is detected, I cut it, resize it to 128 x 128, and save it as a file.

The learning model is not the standard model provided by OpenCV, but uses lbpcascade_animeface.xml, which is specialized for anime faces. It didn't work when I put it locally, so I put it directly in the python library path. In my environment, anaconda was in it before I knew it, so below that.

The anime model has been downloaded from the following. https://github.com/nagadomi/lbpcascade_animeface

I specified minSize because the smaller the area, the higher the false positive rate. Somehow Angers' sleeves and fingers are called "faces".

Next time, I will prepare the TF Records format for machine learning and TensorFlow.

The entire code is below.


from bs4 import BeautifulSoup
import cv2
import os
import requests
import shutil
import urllib


#Environmental arrangement
shutil.rmtree('image')
os.mkdir('image')
os.mkdir('image/faces')


class ImageDownloader(object):
    def __init__(self, keyword):
        session = requests.session()
        session.headers.update(
            {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) \
             Gecko/20100101 Firefox/10.0'})
        params = urllib.parse.urlencode(
            {'q': keyword, 'tbm': 'isch', 'ijn': '1'})
        query = "https://www.google.co.jp/search" + '?' + params
        self._bs = BeautifulSoup(session.get(query).text, 'html.parser')

    def go(self):
        downloads = []
        for img in self._bs.find_all('img'):
            try:
                url = img['data-iurl']
                downloads.append('image/image_{}.jpg'.format(len(downloads) + 1))
                urllib.request.urlretrieve(url, downloads[-1])
            except KeyError:
                print('failed to get url : {}'.format(img))
        return downloads


class FaceDetector(object):

    #Trained model
    FACE_CASCADE = '/home/websoler/anaconda3/lib/python3.7/site-packages/cv2/data/lbpcascade_animeface.xml'

    def __init__(self, fname):
        self._img = cv2.imread(fname)

    def cutout_faces(self, fname):
        gray = cv2.cvtColor(self._img, cv2.COLOR_BGR2GRAY)
        classifier = cv2.CascadeClassifier(FaceDetector.FACE_CASCADE)
        faces = classifier.detectMultiScale(gray, scaleFactor=1.2, minSize=(30, 30))
        if len(faces):
            for (x, y, w, h) in faces:
                region = self._img[y:y + h, x:x + w]
                region_resized = cv2.resize(region, (128, 128))
                cv2.imwrite(fname, region_resized)
                break  #TODO For the time being, only the first case.

if __name__ == '__main__':

    downloads = ImageDownloader('Ange Katrina').go()
    for i, d in enumerate(downloads):
        FaceDetector(d).cutout_faces('image/faces/faces_{}.jpg'.format(i + 1))

Recommended Posts

Face detection by collecting images of Angers.
Low-rank approximation of images by HOSVD step by step
Low-rank approximation of images by Tucker decomposition
[Python] Face detection by OpenCV (Haar Cascade)
Classification of guitar images by machine learning Part 1
Categorize face images of anime characters with Chainer
Low-rank approximation of images by HOSVD and HOOI
Classification of guitar images by machine learning Part 2
Optical Flow, the dynamics of images captured by OpenCV
Anomaly detection of time series data by LSTM (Keras)
I compared the identity of the images by Hu moment
Face detection from images taken with Raspberry Pi camera
Detection of ArUco markers
Improve detection accuracy quickly by specifying parameters with openCV face detection