Introduction to her made with Python ~ Tinder automation project ~ Episode 6

table of contents

What i did Main events
Episode 1 Automatic right swipe
Episode 2 Automatic message sending Matched a woman
Episode 3 Library Exchanged LINE with a matching woman
number 3.Episode 5 Re-acquisition of access token Tokens could not be obtained with the previous code
Episode 4 Data collection LINE replies no longer come
Episode 5 Data analysis Profile sentence Information products were recommended by people I became friends with
Episode 6 Data analysis image edition A real acquaintance girl calls me late at night(?)

The code can be viewed from [GitHub] git.

Synopsis up to the last time

Recent situation

I haven't slept recently. It seems that another acquaintance has made her. This traitor is ... By the way, I got a GPU the other day, so I'm learning using the GPU from this time.

Story of previous research

I'm not the first to think about getting some model into Tinder's swipe strategy. Even if I just looked it up, people who swipe only those who have a face photo [1], people who let DNN learn their favorite photo and swipe [2] [3], on the face Someone [4] was found who was judging whether or not it was processed.

... Well, what did we want to do?

Machine learning is a nuisance, and it's fun to move around, so when you notice it, your goals deviate from your original goals. Going back to the starting point, we started writing code because she wanted it. Most of the information out there is code to "match as many people as you don't need", but what we need is code to "match as many people as possible". After all, the current situation is whether or not to match one person per day, so if you match a person you do not like, you can cancel it manually [^ 1](A large number of matches occur and you can cancel manually Of course, this is not the case for those who are having trouble catching up.) What do you do by narrowing the range of encounters from yourself [^ 2].

[5] is helpful in terms of efforts to make her. Instead of looking for people who are likely to match as we are trying now, we are trying to meet with the approach of "creating a profile that will get as many people as possible". If the matching service used is different, the evaluation strategy of the user may be different, so it cannot be simply applied, but I think it is an interesting attempt. I'd like to try it on Tinder someday, but in that case, will I prepare multiple profiles with different self-introduction sentences and perform reinforcement learning based on the A / B test [^ 3] and the score? .. This policy has been put on hold for me, as it is expected that it will take a long time and a large number of phone numbers to collect enough data, and honestly it is annoying. According to [5], it seems that women can easily swipe right if you include appropriate descriptions about "education", "do you want children", "sociality", and "alcohol" in your profile [^ 4].

Image recognition

I wrote a lot of extra things above, but the point is

--In: Profile photo --Out: Whether it matches

It is a story that I want to make a machine learning machine.

Model building

Speaking of image recognition, we use CNN to estimate whether or not a match is made from the profile image. First, load the image from the data folder.

analytics.py


import pandas as pd
import cv2
import numpy as np
from tqdm.notebook import tqdm
import os
import re

filePath = "data/tinder.xlsx"
imagePath = "data/photos"

df = pd.read_excel(filePath)
df.drop_duplicates(inplace=True, subset="id")
df.set_index("id", inplace=True)

X=[]
y=[]
for fileName in tqdm(os.listdir(imagePath)):
    try:
        id_ = re.match("([a-z0-9]*)-\d( \(\d\))?.jpg ",fileName).group(1)
        match = df.loc[id_]["match"]
        filePath = os.path.join(imagePath, fileName)
        img = cv2.imread(filePath)
        img = cv2.resize(img, (120,120))
        X.append(img)
        y.append(match)
    except:
        pass
X=np.asarray(X)
y=np.asarray(y)

The size of the image is unified to 120 * 120. Divide the loaded image by 255 to fit in the range 0-1 and divide it into train and test.

analytics.py


from sklearn.model_selection import train_test_split

X = X/255
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=8888)

Now that the data is ready, let's build the CNN. This time, we prepared a model of 2 convolution layers + 2 fully connected layers.

analytics.py


import keras
from keras.models import Sequential
from keras.layers import Conv2D, Dense, ReLU, Dropout, Flatten, MaxPool2D

def getModel():
    model=Sequential()
    model.add(Conv2D(3,3,input_shape=(120,120,3)))
    model.add(ReLU())
    model.add(MaxPool2D((2,2)))
    model.add(Dropout(0.25))
    model.add(Conv2D(3,3,padding="same"))
    model.add(ReLU())
    model.add(MaxPool2D((2,2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(1024))
    model.add(Dense(2,activation="softmax"))
    return model

We will train and make predictions.

analytics.py


model = getModel()
model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(X_train, to_categorical(y_train), epochs=30, validation_data=(X_test, to_categorical(y_test)))

y_pred = model.predict(X_test)
y_pred = np.exp(y_pred)
y_pred = (y_pred/np.sum(y_pred, axis=1).reshape(-1,1))[:,1]
print(roc_auc_score(y_test, y_pred))
#>>0.5116927510028815

auc0.51 ... Is it better than random? It's not very useful. There are few matched data ...

Therefore, I would like to try transfer learning. In transfer learning, a model that has been trained in some task in advance is applied to other tasks. In CNN's image recognition model, it is empirically known that the convolution layer seems to extract the universal features of the image, and there is a model that does another task just by properly recreating the last fully connected layer. It's done [6].

image.png From CS231n: Convolutional Neural Networks for Visual Recognition Lecture 7

This time, based on VGG16, I will add a fully connected layer and output the match probability.

analytics.py


#Data preparation is the same as before, X_train, y_train, X_test, y_It is assumed that test has already been prepared.
from keras.applications.vgg16 import VGG16

def getModel():
    model = VGG16(weights="imagenet", include_top=False)
    x = model.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(1, activation="linear")(x)
    model = Model(inputs=model.input, outputs=predictions)
    for layer in model.layers[:-3]:
        layer.trainable=False
    return model

model = getModel()
model.compile(optimizer=Adam(), loss="mse", metrics=["mse"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
y_pred = model.predict(X_test)
print(roc_auc_score(y_test, y_pred))
#>>0.6025131864722308

Auc safely exceeded 0.6. I think the model that can be used is finally completed. Next time, we will finally find a match based on sentences, images, and table data. I want to complete it before Christmas.

Wait! My future girlfriend! !!

References

[1]https://note.mu/sarasara201512/n/n20ec9765a387 [2]https://qiita.com/KR_bangkok/items/00b5ed45f5a8c1428960 [3]https://github.com/joelbarmettlerUZH/auto-tinder [4]https://blog.aidemy.net/entry/2018/07/05/172157 [5]https://qiita.com/data_psyence/items/54bab846337fe1ca61e4 [6]https://qiita.com/ANNEX_IBS/items/55c7a8984fe88a756965

[^ 1]: For this reason, we are not considering mechanically eliminating obstructive accounts for business solicitation. Even if you don't bother to make a model, you can understand it immediately by actually talking. [^ 2]: Of course, the presence or absence of a face can be used as a feature quantity. [^ 3]: The other day, I made friends with a Google person, but he said that his colleague was from Kyoto University and did an A / B test with or without filling in the self-introduction educational background column. As a result, "both with and without the university name (Kyoto University) matched with 0 women, no significant difference was seen." The ending was so sad that I couldn't hear any further details. [^ 4]: It works because the question item "Do you want a child?" Exists by default, and I feel that I would be shunned if I bother to write such content in Tinder, which is a free description.

Recommended Posts

Introduction to her made with Python ~ Tinder automation project ~ Episode 6
Introduction to her made with Python ~ Tinder automation project ~ Episode 5
IPynb scoring system made with TA of Introduction to Programming (Python)
Introduction to Python Image Inflating Image inflating with ImageDataGenerator
[Introduction to Python] Let's use foreach with Python
[Python] Introduction to CNN with Pytorch MNIST
[Python] Easy introduction to machine learning with python (SVM)
Introduction to Artificial Intelligence with Python 1 "Genetic Algorithm-Theory-"
Markov Chain Chatbot with Python + Janome (1) Introduction to Janome
Markov Chain Chatbot with Python + Janome (2) Introduction to Markov Chain
Introduction to Artificial Intelligence with Python 2 "Genetic Algorithm-Practice-"
Introduction to Tornado (1): Python web framework started with Tornado
Introduction to formation flight with Tello edu (Python)
Introduction to Python with Atom (on the way)
Introduction to Generalized Linear Models (GLM) with Python
[Introduction to Udemy Python3 + Application] 9. First, print with print
Introduction to Python language
Introduction to OpenCV (python)-(2)
[Introduction to Python] How to iterate with the range function?
[Chapter 5] Introduction to Python with 100 knocks of language processing
An introduction to Python distributed parallel processing with Ray
Introduction to Mathematics Starting with Python Study Memo Vol.1
Reading Note: An Introduction to Data Analysis with Python
I made Othello to teach Python3 to children (6) Final episode
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Automation] Operate GitLab with Python to facilitate inquiry management
[Chapter 4] Introduction to Python with 100 knocks of language processing
Connect to BigQuery with Python
Introduction to Python Django (2) Win
I made a package to filter time series with python
I made blackjack with python!
Connect to Wikipedia with Python
Post to slack with Python 3
I made a simple book application with python + Flask ~ Introduction ~
Introduction to RDB with sqlalchemy Ⅰ
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
Introduction to serial communication [Python]
Convert Scratch project to Python
Mayungo's Python Learning Episode 3: I tried to print numbers with print
Introduction to Python for VBA users-Calling Python from Excel with xlwings-
Switch python to 2.7 with alternatives
Write to csv with Python
[Introduction to Python] <list> [edit: 2020/02/22]
Introduction to Python (Python version APG4b)
An introduction to Python Programming
[Raspi4; Introduction to Sound] Stable recording of sound input with python ♪
I made blackjack with Python.
Othello made with python (GUI-like)
I made wordcloud with Python.
Introduction to Python For, While
[Introduction to Python] How to get data with the listdir function
[Introduction to Udemy Python3 + Application] 51. Be careful with default arguments
Made it possible to convert PNG to JPG with Pillow of Python
I made a library to easily read config files with Python
A story about adding a REST API to a daemon made with Python
[Introduction to Python] How to split a character string with the split function
Introduction to Data Analysis with Python P32-P43 [ch02 3.US Baby Names 1880-2010]
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~