[PYTHON] Manga Recommendations with Machine Learning Part 1 First, try dividing without thinking

Introduction

Thanks to ElasticSearch as a search engine, it has become relatively easy to extract work information from search words.

With ElasticSearch, you can easily realize the method of recommending from keywords by involving the genre of the work you are reading and the information of the tag.

However, in my case, it's a manga site, so there are also elements such as painting tastes and tastes, so when I had money, the content was secondary, and I usually bought jackets.

So, it is an attempt to supplement such strangely tasteful parts in some way.

As a way to try it, I understand that you can take in manga as image information and classify it in some way (clustering). Then what should I do?

** I don't know if it matches at all, so if I don't have any tsukkomi around it, **

For the time being, I was interested in studying the history of artificial intelligence and bite a little, so if you take that into account, ** Wouldn't it be possible to express an image with what is called a feature quantity and cluster it using it? ** ** I think I'll start from that recognition.

Find a start

When I asked Google Sensei, he often uses a local feature called ** SURF ** in image pattern recognition. This is to generate features by taking points that do not change even if the brightness of the image is changed, scaled, or rotated. Since multiple of them can be taken, ** local ** features. It seems that one image is not one.

Let's start with this, and when clustering its features, let's quickly put in a classification by k-means that seems to be often used again.

To be honest, there are a lot of words that don't make sense, but I think you can find out how to do this as needed.

Environment

Masu is the environment OS:Mac Language: Python2.7 Development environment: PyCharm Community Editioin 2017.1 Machine learning library: scikit-learn Image processing library: mahotas Numerical library: NumPy

The reason for this environment is that it was simply the mechanism I learned in the online curriculum. Maybe we will change it as needed in the future.

When using a Mac, Python 2.7 was done by default, so that's the same.

flow

Learning phase

  1. Read images from the URL described in the text file (100-2000 items)
  2. Calculation of local features by SURF (mahotas)
  3. Use this as unsupervised data (Base feature) after clustering by k-means (scikit-learn)

Classification phase

  1. Read 100 data separately
  2. Calculate SURF
  3. Execute clustering (first 10 classifications and 25 classifications) from the Base features mentioned earlier.
  4. Look at the results.

Code

Learning phase


# coding:utf-8
import numpy as np
from sklearn import cluster
from sklearn.externals import joblib
import mahotas as mh
from mahotas import surf
from datetime import datetime
import cStringIO
import urllib

datetime_format = "%Y/%m/%d %H:%M:%S"

#Parameters
feature_category_num = 512


#Bring the image URL from a text file.
list = []

list_file = open("list.txt")

for l in list_file:
    list.append(l.rstrip())

list_file.close()

#Image processing
base = []

j=0

for url in list:
    file = cStringIO.StringIO(urllib.urlopen(url).read())
    im = mh.imread(file, as_grey=True)
    im = im.astype(np.uint8)
    base.append(surf.surf(im))

concatenated = np.concatenate(base)

del base

#Calculation of Base features

km = cluster.KMeans(feature_category_num)
km.fit(concatenated)

#Storage of Base features
joblib.dump(km, "km-cluster-surf-.pk1")

Classification phase


# coding:utf-8
import numpy as np
from sklearn import cluster
from sklearn.externals import joblib
import mahotas as mh
from mahotas import surf

import cStringIO
import urllib

#Parameters
feature_category_num = 512
picture_category_num = 25

#Trained model loading

km = joblib.load("km-cluster-surf.pk1")


#Bring the image URL from a text file.
list = []

list_file = open("list2.txt")

for l in list_file:
    list.append(l.rstrip())

list_file.close()

#Image processing
base = []

for url in list:

    title = file[1]
    file = cStringIO.StringIO(urllib.urlopen(url).read())

    im = mh.imread(file, as_grey=True)
    im = im.astype(np.uint8)

    base.append(surf.surf(im))

concatenated = np.concatenate(base)

features = []

#Start classifying images from basic features

for d in base:
    c = km.predict(d)
    features.append(np.array([np.sum(c==ci) for ci in range(feature_category_num)]))

features=np.array(features)


km = cluster.KMeans(n_clusters=picture_category_num,verbose=1)
km.fit(features)

#File spit out the result

list = np.array(list)

for i in range(picture_category_num):
    print('Image category{0}'.format(i))
    challenge = list[km.labels_ == i]
    for c in list:
        print(c)

result

Writing from the conclusion, the classification was completed.

But I couldn't find it.

There are probably various reasons, but I would like to think about that next time.

Recommended Posts

Manga Recommendations with Machine Learning Part 1 First, try dividing without thinking
Try machine learning with Kaggle
Try deep learning with TensorFlow Part 2
Try machine learning with scikit-learn SVM
[Machine learning] Try running Spark MLlib with Python and make recommendations
Predict power demand with machine learning Part 2
Machine learning starting with Python Personal memorandum Part2
Machine learning starting with Python Personal memorandum Part1
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Try to predict forex (FX) with non-deep machine learning
Record of the first machine learning challenge with Keras
[Machine learning] Start Spark with iPython Notebook and try MLlib
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
Try to predict if tweets will burn with machine learning
Feature Engineering for Machine Learning Beginning with Part 3 Google Colaboratory-Scaling
Machine learning learned with Pokemon
Try deep learning with TensorFlow
Machine learning with Python! Preparation
Try Deep Learning with FPGA
Machine learning Minesweeper with PyTorch
Beginning with Python machine learning
Machine learning with python without losing to categorical variables (dummy variable)
Try Deep Learning with FPGA-Select Cucumbers
Reinforcement learning 13 Try Mountain_car with ChainerRL.
[Machine learning] Try studying decision trees
[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 2)
I tried machine learning with liblinear
Machine learning with python (1) Overall classification
Machine learning beginners try linear regression
[Machine learning] Try studying random forest
[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 1)
Try Common Representation Learning with chainer
Quantum-inspired machine learning with tensor networks
Get started with machine learning with SageMaker
"Scraping & machine learning with Python" Learning memo
[Python] [Machine learning] Beginners without any knowledge try machine learning for the time being