[PYTHON] Determine if the gold coin is genuine

Solve CodeIQ problems

It's a problem over a year ago, but I'll try to solve one of the machine learning problems listed in CodeIQ.

"Machine learning basics" Let's solve and understand simple problems! Part 1 http://next.rikunabi.com/tech/docs/ct_s03600.jsp?p=002315

The first question of this is a challenge.

Linear separation problem

problem

Mr. N, who participated in the PRML reading party held on a pirate ship on the weekend, was fascinated by the gold and silver treasures piled up on the ship. When I opened one of the treasure chests nearby, I found a number of glittering coins.
When you pick it up, it has a lot of weight. It must be a gold coin.
I was told that I could take as many as I wanted, so I decided to pack some in my bag on my way home from the study session.
After returning home, Mr. N became a little calm and began to wonder, "I was generously handing it out, but is this gold coin genuine?"
There were 20 gold coins in the bag, but when I asked my friend Archimedes to measure them, the volume and weight of the 20 coins were different.
When I searched on the net, I got data on the volume, weight and authenticity of gold coins.
Please refer to this data to identify the authenticity of the gold coins you received.

As mentioned in the original article, it is data that seems to be able to be linearly separated neatly.

Solve with scikit-learn as usual.

Data reading

import numpy as np
from sklearn.svm import LinearSVC
import matplotlib.pyplot as plt

auth = np.genfromtxt('CodeIQ_auth.txt', delimiter=' ')

#Teacher data
train_X = np.array([[x[0], x[1]] for x in auth])
#Teacher data label
labels = [int(x[2]) for x in auth]

#test data
test_X = np.genfromtxt('CodeIQ_mycoins.txt', delimiter=' ')

Data visualization

First, let's plot the data.

fig = plt.figure()
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)

#Extract the correct answer from the teacher data
correct = np.array([[x[0], x[1]] for x in auth if x[2] == 1]).T
#Also extract fake
wrong   = np.array([[x[0], x[1]] for x in auth if x[2] == 0]).T

#Plot these into a scatter plot
ax1.scatter(correct[0], correct[1], color='g')
ax1.scatter(wrong[0],   wrong[1],   color='r')
ax2.scatter(train_X.T[0], train_X.T[1], color='b')
ax2.scatter(test_X.T[0],  test_X.T[1],  color='r')

plt.legend(loc='best')
plt.show()
plt.savefig("image.png ")

image.png

The green in the above figure is the correct answer, and the red is the fake gold coin. It's just like the plot in the original article.

The figure below shows the distribution of the gold coins (red) obtained with respect to the authenticity data (blue).

solution

Since it is a linear separation problem, we use LinearSVC.

clf = LinearSVC(C=1)

#Training
clf.fit(train_X, labels)

#Classification
results = clf.predict(test_X)
for result, feature in zip(results, test_X):
    print(result, feature)

result

1 [  0.988  17.734]
0 [ 0.769  6.842]
0 [ 0.491  4.334]
1 [  0.937  16.785]
1 [  0.844  13.435]
0 [ 0.834  9.518]
1 [  0.931  16.62 ]
1 [ 0.397  6.705]
1 [  0.917  16.544]
0 [ 0.45   3.852]
0 [ 0.421  4.612]
1 [ 0.518  9.838]
1 [  0.874  14.113]
0 [ 0.566  6.529]
0 [ 0.769  8.132]
1 [  1.043  16.066]
0 [ 0.748  9.021]
0 [ 0.61   6.828]
0 [  1.079  12.097]
1 [  0.771  13.505]

0 1 on the left is the answer. So I came up with the same answer as the example.

Recommended Posts

Determine if the gold coin is genuine
Determine if the string is formatable
Determine if the library is installed.
Determine if an attribute is defined in the object
Determine if AWS Chalice is chalice local
[Python] Determine if any coordinate point is inside or outside the polygon
Check if the string is a number in python
[Nohup] Execution even if the terminal is turned off
Check if it is Unix in the scripting language
Check if the LAN cable is disconnected on Linux
Check if it is Unix in the scripting language
The first GOLD "JDBC"
The first GOLD "Function"
I want to initialize if the value is empty (python)
Determine if a string is a time with a python regular expression
[Pandas] If the first row data is in the header in DataFrame