[PYTHON] A memorandum of method often used in machine learning using scikit-learn (for beginners)

Introduction

We have summarized the methods that are often used when doing machine learning. We will make corrections as needed.

Preprocessing

Standardization

StandardScaler


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()  #Instance creation
scaler.fit(pd_sample)      #Parameter calculation (mean, standard deviation, etc.)
pd_sample_sc = scaler.transform(pd_sample)  #Data conversion

#pd_sample_sc = scaler.fit_transform(pd_sample)Can be executed collectively with

Dummy variable

get_dummies


#pandas.get_dummies()function
pd_sample = pd.get_dummies(pd_sample)

Training data / evaluation data division

train_test_split


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

Unsupervised learning

Clustering

KMeans


from skleran.cluster import KMeans

kmeans = KMeans(n_clusters=4, random_state=0)  #K-means model definition
clusters = kmeans.fit(pd_sample)               #Clustering execution
pd_sample['cluster'] = clusters.labels_        #Get clustering results

Dimensionality reduction

PCA


from sklearn.decomposition import PCA

pca = PCA(n_components=2)         #PCA model definition
pca.fit(pd_sample)                #Principal component analysis
x_pca = pca.transform(pd_sample)  #Data conversion (return value is array type object)
x_pca = pd.DataFrame(x_pca)       #Restore in DataFrame type

#x_pca = pca.fit_transform(pd_sample)Can be executed collectively with

Supervised learning

Regression model

LinearRegression


from sklearn.linear_model import LinearRegression()

model = LinearRegreession()  #Model initialization
model.fit(X_train, y_train)  #Modeling

#Accuracy verification of learning data and evaluation data
print(model.score(X_train, y_train))
print(model.score(X_test, y_test))

#Outputs a coefficient representing the degree of contribution for each explanatory variable
coef = pd.DataFrame({"feature_names":X.columns, "coefficient":model.coef_})
print(coef)

#Predict regression values for unknown data
print(model.predict(x_pred))

Classification model

DecisionTreeClassifier


from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(random_state=0)  #Model initialization
model.fit(X_train, y_train)                     #Modeling

#Accuracy verification of learning data and evaluation data
print(model.score(X_train, y_train))
print(model.score(X_test, y_test))

#Outputs a coefficient representing the degree of contribution for each explanatory variable
importance = pd.DataFrame({"feature_names":X.columns, "coefficient":model.feature_importances_})
print(importance)

#Predict classification values for unknown data
print(model.predict(x_pred))

#0/Output the prediction probability of 1
print(model.predict_proba(x_pred))

Validation of classification model

#Correct answer rate= (TP+TN)/(TP+FN+FP+TN)
model.score(X_test, y_test)

#Mixed matrix
from skleran.metrics import confusion_matrix
matrix = confusion_matrix(X_test, y_test)

#Heat map of mixed matrix
import seaborn as sns
sns.heatmap(matrix, annot=True, cmap='Blues')
plt.xlabel('Prediction')
plt.ylabel('Target')
plt.show()

#Adaptation rate= TP/(TP+FP)
from sklearn.metrics import precision_score
precision_score(X_test, y_test)

#Recall= TP/(TP+FN)
from sklearn.metrics import recall_score
recall_score(X_test, y_test)

#F value= 2*(Precision*Recall)/(Precision+Recall)
from sklearn.metrics import f1_score
f1_score(X_test, y_test)

Recommended Posts

A memorandum of method often used in machine learning using scikit-learn (for beginners)
[Python machine learning] Recommendation of using Spyder for beginners (as of August 2020)
Full disclosure of methods used in machine learning
Summary of evaluation functions used in machine learning
[For beginners] Introduction to vectorization in machine learning
A collection of code often used in personal Python
A collection of Excel operations often used in Python
A memorandum of using eigen3
A beginner's summary of Python machine learning is super concise.
<For beginners> python library <For machine learning>
How about Anaconda for building a machine learning environment in Python?
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Stock price forecast using machine learning (scikit-learn)
Summary of methods often used in pandas
[Machine learning] LDA topic classification using scikit-learn
A memorandum of using Python's input function
First Steps for Machine Learning (AI) Beginners
[python] Frequently used techniques in machine learning
Impressions of using Flask for a month
"Usable" one-hot Encoding method for machine learning
[Machine learning] List of frequently used packages
Processing memos often used in pandas (beginners)
Creating a development environment for machine learning
A collection of Numpy, Pandas Tips that are often used in the field
[For beginners of artificial intelligence] Machine learning / Deep Learning Programming Learning path and reference books
List of main probability distributions used in machine learning and statistics and code in python
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
Creating a position estimation model for the Werewolf Intelligence Tournament using machine learning
Basic data frame operations written by beginners in a week of learning Python
Machine learning memo of a fledgling engineer Part 1
A story about simple machine learning using TensorFlow
Data supply tricks using deques in machine learning
A memorandum of studying and implementing deep learning
A proposal for versioning of features in Kedro
List of links that machine learning beginners are learning
Overview of machine learning techniques learned from scikit-learn
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Machine learning memo of a fledgling engineer Part 2
Tips for using ElasticSearch in a good way
[Linux command] A memorandum of frequently used commands
Try using Jupyter Notebook of Azure Machine Learning
Basic story of inheritance in Python (for beginners)
Causal reasoning using machine learning (organization of causal reasoning methods)
How to make a face image data set used in machine learning (2: Frame analysis of video to obtain candidate images)
Overview and useful features of scikit-learn that can also be used for deep learning
Simple code that gives a score of 0.81339 in Kaggle's Titanic: Machine Learning from Disaster
A memorandum of commands, packages, terms, etc. used in linux (updated from time to time)
[For beginners of deep learning] Implementation of simple binary classification by full coupling using Keras
Create a dataset of images to use for learning
[Recommended tagging for machine learning # 2] Extension of scraping script
[Recommended tagging for machine learning # 2.5] Modification of scraping script
Memorandum of methods useful for organizing columns in DataFrame
Installation of TensorFlow, a machine learning library from Google
About testing in the implementation of machine learning models
A collection of commands frequently used in server management
[Python] I made a classifier for irises [Machine learning]
Machine learning beginners try to make a decision tree
Study method for learning machine learning from scratch (March 2020 version)
xgboost: A valid machine learning model for table data
Everything for beginners to be able to do machine learning