Data mining is the application of analysis techniques to large amounts of data to discover previously unknown features of the data and gain new insights, often using techniques common to machine learning.
As a simple idea, if you have the result of unsupervised learning, you can apply the knowledge to the rest of the data as well, and you can improve the accuracy of the result as training data.
Last time introduced an example of grouping students according to their tendency based on their grades. The teacher in charge of these students thought that the students of the entire grade could be classified based on the results of this grouping.
The support vector machine, which is a kind of discriminant function, and the naive Bayes, which is a stochastic classifier, have completely different ideas and methods.
There are many great things about scikit-learn, but one of them is the consistent API designed in different ways. , I think it can be implemented with similar code.
import numpy as np
from sklearn.cluster import KMeans
from sklearn.naive_bayes import GaussianNB
from sklearn import svm
#Since the data posting is redundant, it is separated by commas..In the format to read csv
features = np.genfromtxt('data.csv', delimiter=',')
# K-means classify labels by clustering
kmeans_model = KMeans(n_clusters=3, random_state=10).fit(features)
labels = kmeans_model.labels_
clf = GaussianNB() #Naive Bayes classifier with Gaussian kernel
#clf = svm.SVC(kernel='rbf', C=1) #RBF kernel support vector machine
#Train the classifier based on the clustering results
clf.fit(features, labels)
#Data to be tested(Grades of other students in the grade)Read
test_X = np.genfromtxt('test.csv', delimiter=',')
#Classify with a classifier
results = clf.predict(test_X)
#Sort and display results
ranks = []
for result, feature in zip(results, test_X):
ranks.append([result, feature, feature.sum()])
ranks.sort(key=lambda x:(-x[2]))
for rank in ranks:
print(rank)
You can see it by actually preparing sample data, but I think that you can visually understand how the boundaries of each class change with each method. From here on, we will talk about each theory.
Recommended Posts