Aidemy 2020/10/28
Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This time, it will be a post of unsupervised learning. Nice to meet you.
What to learn this time ・ About unsupervised learning ・ Types of unsupervised learning ・ Mathematical prior knowledge
-In supervised learning, learning is performed by giving an "answer" called a class label, but in unsupervised learning, the computer itself judges and learns without passing this answer. ・ This time, we will learn about __ "clustering" __ and __ "principal component analysis" __ in this unsupervised learning.
-Clustering is a __ method that divides __data into chunks (clusters). -One of the clustering methods __ "k-means method" __ is that __ people decide the number of clusters __ and the computer divides the data so that the number is the same. -In the k-means method, learning is performed so that the position of a point called the "center of gravity" is appropriate, and clustering is performed based on this.
-Principal component analysis is a __ method that reduces the dimensions of __ data (dimension reduction) and aggregates information in one graph. -Principal component analysis is performed by learning and determining the (principal component) axis that specifically indicates the characteristics of the data. -For example, an axis is defined from three different data of "age, height, and weight" and represented in a two-dimensional graph in the form of "personal data".
・ The coordinate distance between two points (x1, x2) and (y1, y2) in two-dimensional space is
・ Euclidean distance can be calculated by NumPy as follows. (__np.linalg.norm () __ stands for "sum of squares in ()")
-When evaluating how similar two vectors are, it is judged from the similarity between __ "length" and "direction" .
・ Focusing on the direction, it can be said that the smaller the angle __ “θ” __ created by the two vectors, the higher the similarity.
・ As a method of finding θ, the formula of the inner product of vectors
-In the code, it can be calculated by NumPy. (__np.dot () __ represents "the sum of the products of each element" (1 * 2 + 2 * 3 + 3 * 4 in the following))
・ Unsupervised learning is a method in which the computer itself judges and learns without passing the correct answer label. -Unsupervised learning includes "clustering" and "principal component analysis". The former is a method of dividing data into clusters, and the latter is a method of aggregating information into one graph by reducing the dimensions. -In unsupervised learning, data similarity may be judged by "__ Euclidean distance (norm) " or " cosine similarity __".
This time is over. Thank you for reading until the end.
Recommended Posts