[PYTHON] Machine learning ④ K-nearest neighbor Summary

Summary of K-nearest neighbor

What is K-nearest neighbor?

When plotting Train Data on a plane and testing a certain test data't', label the test data't' on the plane with the mode of K points close to that point t. .. That is the K-nearest neighbor method. (Although I use other than planes, I will use planes that are easy to understand here to explain the concept.) It's a little difficult, so I'll borrow a diagram from Wikipedia.

Screen Shot 2017-05-11 at 16.05.09.png Extracted from Wikipedia

Label the green dots based on the K closest dots. What I want to pay attention to here is the variable K, which is K points. For example, in the figure above If K = 3, 2 red points, 1 blue point and a green point will be labeled as red. If K = 5, the red points will be labeled as blue and the green points will be labeled as blue with 2 points for red and 3 points for blue.

Precautions for K-nearest neighbor method

--In the above case, if the red and blue dots are separated to some extent, the K-nearest neighbor method will work well. However, on the contrary, the red and blue points are not particularly separated, and if the data is a mixture of red and blue points, it is not a good idea to use the K-nearest neighbor method.

--Furthermore, if you specify an even number when specifying the number of K, the two labels will be the same number and you may not be able to classify them, so be sure to make the number of K odd.

――And if you increase the number of K, for example, if the number of red dots is abnormally large compared to the number of blue dots, the probability of being classified as red will also be abnormally large. Therefore, you need to pay attention to the ratio of red and blue dots.

default code

python



from sklearn.neighbors import KNeighborsClassifier

KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', 
leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs)

Description of the Parameter in the K-nearest neighbor

Perhaps the most important parameter is n_neighbors described above, and the code below will give you the optimal number of K points to consider.

python



#Make a list of numbers to put in K K
myList = list(range(1,50))

#Subtract even numbers from that list to make a list of only odd numbers
neighbors = filter(lambda x: x % 2 != 0, myList)

#Make an empty list of Cross Validation scores
cv_scores = []

#Cross validate and append the score to the above empty list
for k in neighbors:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X_train, y_train, cv=10, scoring='accuracy')
    cv_scores.append(scores.mean())

This is also an excerpt from kevinzakka's blog

The above are the main parameters. Also, I will add as soon as I understand more. If you understand, editing requests are also welcome. I look forward to working with you.

The pros and cons of the K-nearest neighbor method.

The larger the data, the more likely it is that a more accurate classification will be possible. A simple and easy-to-understand model.

--Bad point

I mentioned two bad points in the What is K-nearest neighbor method, but I will summarize it again in one sentence. If multiple classes are abnormally mixed, or if the ratio is abnormally biased, the classification may not work.

Summary

The above is the outline of the K-nearest neighbor as far as I can understand. We will update it daily, so if you have something to add or fix, we would appreciate it if you could comment.

Recommended Posts

Machine learning ④ K-nearest neighbor Summary
Machine learning tutorial summary
Machine learning ⑤ AdaBoost Summary
Machine Learning: k-Nearest Neighbors
Machine learning ② Naive Bayes Summary
Machine learning article summary (self-authored)
Machine learning #k-nearest neighbor method and its implementation and various
Machine learning
Machine learning ① SVM (Support Vector Machine) Summary
Machine learning summary by Python beginners
Machine learning ③ Summary of decision tree
scikit-learn How to use summary (machine learning)
"Python Machine Learning Programming" Summary Note (Jupyter)
Machine learning algorithm classification and implementation summary
[Memo] Machine learning
Machine learning classification
Machine learning algorithm (linear regression summary & regularization)
Machine Learning sample
Summary of evaluation functions used in machine learning
About machine learning overfitting
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Summary for learning RAPIDS
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
What is machine learning?
Machine Learning Professional Series Round Reading Session Slide Summary
Machine learning python code summary (updated from time to time)
Site summary to learn machine learning with English video
Summary of the basic flow of machine learning with Python
Introduction to machine learning ~ Let's show the table of K-nearest neighbor method ~ (+ error handling)
Machine learning model considering maintainability
Machine learning learned with Pokemon
Data set for machine learning
Japanese preprocessing for machine learning
Python Machine Learning Programming Chapter 2 Classification Problems-Machine Learning Algorithm Training Summary
Machine learning in Delemas (practice)
K-nearest neighbor method (multiclass classification)
An introduction to machine learning
Machine learning / classification related techniques
Machine Learning: Supervised --Linear Regression
Ensemble learning summary! !! (With implementation)
Basics of Machine Learning (Notes)
Machine learning beginners tried RBM
[Machine learning] Understanding random forest
Machine learning with Python! Preparation
Site summary where you can learn machine learning for free
Machine Learning Study Resource Notepad
Understand machine learning ~ ridge regression ~.
About machine learning mixed matrices
Machine Learning: Supervised --Random Forest
Practical machine learning system memo
Machine learning Minesweeper with PyTorch
Machine learning environment construction macbook 2021
Python Machine Learning Programming> Keywords
Machine learning algorithm (simple perceptron)