[PYTHON] Until you create a site where you can search for similar images (erotic videos) of the face specified by Sakura VPS + django + scikit-learn

** Using Sakura's VPS, Django, and scikit-learn, I created a website where you can search for similar AV images that have AV actresses similar to the facial images you care about. ** **

The site name is Like Navi

I used to use unity, open-GL, and Objective-C because I was developing apps so far, but I also recorded the history of creating web services from scratch for the purpose of studying python and machine learning and practical benefits. I will.

I think that all the points that beginners fit in are addicted, so I think it will be helpful for beginners.

By the way, I didn't use SIFT or CNN because the accuracy of BOVW-like ones was low and the dimension became large and the amount of calculation increased. If I do CNN, I have to extract images from the video or I feel that there are not enough samples and I gave up because it was a poor machine.

Original story: AV image recognition technology and its surroundings ** Thank you !**

table of contents

------------ Local work ----------------

-[1] Scraping the face image of the AV actress from the DMM. -[2] Extract face image with opencv, extract histogram of face image -[3] Create a model that classifies given face images by compressing pca and kmeans dimensions. -[4] Create a site using django -[5] Have CrowdWorks make html and design -[6] Create a script to cut out the face image using CropperJS

------------ Server side work ----------------

-[7] Build an environment for Sakura vps such as opencv, virtualenv, mod_wsgi, scikit-learn, django, etc. -[8] Set mod_wsgi and deploy django on Sakura VPS -[9] Acquire a domain with Sakura VPS and set a virtual host

[1] Scraping the face image of the AV actress from the DMM.

I don't expose the code because it's normal to not give the script used for crawling, I used python and BeautifulSoup4.

See Scraping with Python and Beautiful Soup and it will be solved for generations.

[2] Extract face image with opencv and extract histogram of face image

It is difficult to build opencv itself, but if you do it locally or on a Mac, install anaconda and install it.

pip install anaconda conda install -c https://conda.binstar.org/jjhelmus opencv

It can be used just by typing the two commands. Easy.

This area will be helpful Try to extract face image from video using Python (Anaconda) and OpenCV Perform face recognition with OpenCV, trim and save only the face part [Python]

# -*- coding:utf-8 -*-
import cv2
import sys
import os
import shutil

image_path = "image.png "#Arbitrary image path

#Acquire the features of the cascade classifier
face_cascade = cv2.CascadeClassifier('[yourdir]/opencv/data/haarcascades/haarcascade_frontalface_default.xml')

#File reading
image = cv2.imread(image_path)#Arbitrary image path

#Grayscale conversion
image_gray = cv2.cvtColor(image, cv2.cv.CV_BGR2GRAY)


#Execution of object recognition (face recognition)
faces = cascade.detectMultiScale(image_gray, scaleFactor=1.2, minNeighbors=2, minSize=(10, 10))


#Creating a directory
if len(faces) > 0:
	path = os.path.splitext(image_path)
	dir_path = path[0] + '_face'
	if os.path.isdir(dir_path):
		shutil.rmtree(dir_path)
	os.mkdir(dir_path)

i = 0;
for rect in faces:
	#Cut out only the face and save
	x = rect[0]
	y = rect[1]
	width = rect[2]
	height = rect[3]
	dst = image[y:y+height, x:x+width]
	new_image_path = dir_path + '/' + str(i) + path[1];
	cv2.imwrite(new_image_path, dst)#Enter the path to save the cropped image here
	i += 1

After extracting the face image, the Histograms of Oriented Gradients are extracted.

Reference for HOG features: Extraction of HOG features from images

Very simple

hog=cv2.HOGDescriptor()
img=cv2.imread('test.jpg')
res=hog.compute(img)

Just say. This gives a distributed representation of the image.

[3] Create a model that classifies given facial images by compressing pca and kmeans dimensions.

However, the problem with HOG features is the curse of dimensionality. Normally, if you try to calculate the similarity between nearly 300,000 images and a website and return it, it is impossible to calculate, so it is necessary to cluster (kmeans) the results of dimensional compression (PCA) and PCA.

First, we need to get a distributed representation of all the images, so I made the class below to read images in batches for each folder.

from PIL import Image
import os
import cv2
from sklearn.cluster import KMeans
import numpy as np
import random
import pickle
import re
import sklearn
import dill



class Imageob(object):
	def __init__(self):
		pass

	def fileread(self, filepath):
		self.path = filepath
		try:
			temp = cv2.imread(filepath)
			self.src = cv2.resize(temp,(64,64))
			self.shape = temp.shape
			
		except cv2.error as e:
			pass

	def readArray(self, array):
		self.srcGrey = array


class Images(object):
	def __init__(self):
		self.images = []

	def addImage(self, image):
		self.images.append(image)

	def readAllFiles(self, folderpath, isResize=False, height=256, width=256):

		for path in self.readAllFilePath(folderpath):
			p = path.replace(folderpath, "")
			p = p.split("/")
			m= m+1

			image = Imageob()
			image.fileread(path)

			if image.shape[0] > 50: #Delete images that are too small
				pi = ProcessImage()
				image = pi.resize(image, 64, 64)
				self.addImage(image)
				n = n+1


	def readAllFilePath(self, folderpath):
		for root, dirs, files in os.walk(folderpath):
			for file in files:
				if not file.startswith(".") and file.endswith(".jpg "):
					yield os.path.join(root, file)

#Extract HOG features from here

images = Images()
print("Start loading the image")

images.readAllFiles("/image_path")#Specify the folder containing the face image

data = []
label = []
num = 0

for image in images.images:
	hog = cv2.HOGDescriptor((64, 64), (16, 16), (8, 8), (8, 8), 9)
	img_ = cv2.imread(image.path)
	img_ = cv2.resize(img_,(64,64))
	try:
		hist = hog.compute(img_)
		k = [] #Make it a list expression like numpy.
		if hist is not None:
			for i in hist:
				k.append(i[0])
		else:
			print("hist is NONE")

		data.append(k)

	except cv2.error as e:
		pass


npdata = np.array(data) # scikit-np for use with learn.Make it into array format

pca = sklearn.decomposition.PCA(100)
pca.fit(npdata)

X_pca= pca.transform(npdata)

a =[] #Put the histogram in the list format created above

for x in X_pca:
   a.append(x)

kmeans_model  = KMeans(n_clusters=10, random_state=10).fit(a)
labels = kmeans_model.labels_

You now have a PCA model and a Kmeans model. If you make a crappy class like me and make a class with child elements, I'm having problems serializing with Python pickle, so It is recommended to save it with dill because it is not heavy and it will be permanent.

Workaround for the problem that Python pickle cannot serialize

After that, if you write as follows, you can calculate the COS similarity by giving the image path.

def cos_sim(v1, v2):
	return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

class Hog(object):
	def __init__(self):
		pass

	def hog(self, path):
		hog = cv2.HOGDescriptor((64, 64), (16, 16), (8, 8), (8, 8), 9)
		img = cv2.imread(path)
		img = cv2.resize(img , (64, 64))

		t = []
		try:
			hist = hog.compute(img)
			k = []
			if hist is not None:
				for i in hist:
					k.append(i[0])
			else:
				print("hist is NONE")
			t.append(k)

		except cv2.error as e:
			print('cv2_error')
		return t


After giving an image, take the HOG feature, compress the dimension using the principal component analysis model, estimate the cluster to which it belongs using the Kmeasn model, and estimate the similarity with all the images in that cluster. You just have to calculate.

The sites I referred to are below. (The source code is definitely cleaner here.)

I tried to classify anime face photos by Bag of Keywords

How to use KAZE Measurement of features (ORB / AKAZE / KAZE / FAST / BRISK) with OpenCV3.0.0-dev + Python2.7.6

[4] Create a site using django

At this point, the rest is like a tutorial I made it by looking here.

Introduction to Python Django

[5] Have CrowdWorks make html and design

To be honest, design and markup are the least suitable work, so I outsourced it. Like this. By the way, the investment has not been recovered at all. There is no prospect of recovery.

Design / coding of the screen of "Like Navi !!" filled with romance of a man

[6] Create a script to cut out a face image using CropperJS

It seems that JS is not good at it, so I decided to implement it here.

Cropper.js

It was a library that extracts very convenient images on the browser because smartphones are also optimized as they are. It's written here easily

Try using the jquery plugin "Cropper" that crops images

Feeling that you can understand by playing with the basic demo.

From here on the server side

[7] Build an environment for Sakura vps such as opencv, virtualenv, mod_wsgi, scikit-learn, django, etc.

To be honest, this took the longest time.

--The initial setting of Sakura VPS is done according to here.

--Add python Install Python 2.7 on CentOS 6.5 and use it with Apache

In the first place, the version of python 2.6.6 in CentOS 6.5 of Sakura VPS is different from mod_wsgi, and an error occurred. What's more, it doesn't throw a django error, and it's really annoying because the request doesn't come back on a white screen.

--Virtualenv From here. This is essential because without it you wouldn't normally know which library the path refers to.

--OpenCV in CENTOS 6.5 (maybe the most difficult in all processes)

Let's see the following as a reference. Probably the hardest. #I honestly don't remember. By the way, it was a nightmare that I couldn't use the server if I overwrote the dependent files brightly without backing them up.

Try installing OpenCV3.0 on AMI Work memo with OpenCV3.0 and opencv_contrib in ubuntu [Package required when using openCV with CENTOS6](http://www.gimmickgeek.com/2014/10/04/centos6%E3%81%A7opencv%E3%82%92%E5%88%A9%E7 % 94% A8% E3% 81% 99% E3% 82% 8B% E6% 99% 82% E3% 81% AB% E5% BF% 85% E8% A6% 81% E3% 81% AA% E3% 83 % 91% E3% 83% 83% E3% 82% B1% E3% 83% BC% E3% 82% B8 /)

By the way, you can learn the basic idea here. Notes on running Django from a clean VPS (CentOS6) using Apache

[8] Set mod_wsgi and deploy django on Sakura VPS

django deployment settings

[django production environment construction to release (django 1.8.7 + Apache + mod_wsgi)] (http://marrsan.hateblo.jp/entry/2015/12/03/145235)

If you look at it, 99% is okay.

However, it was necessary to solve the problem that djnago and scikit-learn cause in CENTOS. I got stuck here. https://github.com/scikit-learn/scikit-learn/issues/3947

In the wsgi.conf file created above

LoadModule wsgi_module modules/mod_wsgi.so WSGIPythonPath /home/hoge/ENV/lib/python2.7/site-packages WSGIApplicationGroup %{GLOBAL}

If you give it as, it will be solved.

[9] Acquire a domain with Sakura VPS and set it as a virtual host

After that, get the domain and set the virtual host. See the site below

Try using Sakura's VPS [8]-Try setting VirtualHost

Please rewrite it because it is filled with ****** appropriately.

python.conf


<VirtualHost *:80>

ServerName sokkurinavi.com
ServerAlias www.sokkurinavi.com

WSGIScriptAlias / /var/www/cgi-bin/*******************/wsgi.py

#WSGIScriptAlias /sokuri /var/www/cgi-bin/*************
#Alias /static /home/hoge/ENV/lib/python2.7/site-packages/django/contrib/admin/static

Alias /static/ /var/www/cgi-bin/******************/static/

<Directory /var/www/cgi-bin/******************/static>
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
</Directory>

<Directory /var/www/cgi-bin/***************>
Order deny,allow
Allow from all
</Directory>

</VirtualHost>

The completed version is this site

Like Navi

** One last thing I want to say is that if you make an erotic site, I thought it would be inflow, but I don't recommend erotic sites because it's always inflow 1 or 2. ** **

Recommended Posts

Until you create a site where you can search for similar images (erotic videos) of the face specified by Sakura VPS + django + scikit-learn
Make a Discord Bot that you can search for and paste images