[PYTHON] Preprocessing in machine learning 1 Data analysis process

Aidemy 2020/10/29

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the first post of machine learning pre-processing. Nice to meet you.

What to learn this time ・ Data analysis flow

Data analysis flow

-As the flow (process) of data analysis, CRISP-DM and KDD have been proposed. CRISP-DM -CRISP-DM has the following process. ① __ Business understanding __: Clarify what the issues are and what to do with data analysis. ② __ Data understanding __: Understand whether data can be acquired and analyzed. ③ __ Data preparation __: Format the data into a form that can be used in ④ modeling. ④ __ Modeling __: Apply a model to the data and analyze it. ⑤ __ Evaluation __: Evaluate whether the analysis result is sufficient. ⑥ __Apply __: Actually apply the analysis results to issues and tasks.

-However, these processes are not necessarily one-way, and may return if necessary. ・ The pre-processing of the data learned this time corresponds to (2) and (3) of this process.

・ Figure![Screenshot 2020-10-29 14.36.04.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/d949cd2a-4ff7-833e- d8b1-df799dcf7741.png)

KDD -KDD has the following process. ① __ Data acquisition __: Set issues and goals and acquire data. ② __Data selection __: Select the data to be used for analysis (data mining) from the acquired data. ③ __Data cleansing __: Performs data cleansing such as deleting missing values and outliers. ④ __Data conversion : Converts the cleansed data format to a format that can be used for data mining. ⑤__Data mining: Performs regression and classification on the converted data for analysis and learning. ⑥ __ Interpretation / Evaluation __: Interpret and evaluate the pattern from the results obtained by data mining.

・ Figure![Screenshot 2020-10-29 14.36.19.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/78ad3832-30bf-33c2- 4788-44865de4cc74.png)

This time is over. Thank you for reading until the end.

Recommended Posts

Preprocessing in machine learning 1 Data analysis process
Preprocessing in machine learning 2 Data acquisition
Preprocessing in machine learning 4 Data conversion
Python: Preprocessing in machine learning: Data acquisition
Python: Preprocessing in machine learning: Data conversion
Machine learning in Delemas (data acquisition)
Python: Preprocessing in Machine Learning: Overview
Pre-processing in machine learning 3 Missing values, outliers, and imbalanced data
[Python] First data analysis / machine learning (Kaggle)
Data supply tricks using deques in machine learning
I started machine learning with Python Data preprocessing
A story about data analysis by machine learning
Python: Preprocessing in machine learning: Handling of missing, outlier, and imbalanced data
Data set for machine learning
Japanese preprocessing for machine learning
Machine learning in Delemas (practice)
About data preprocessing of systems that use machine learning
Python data analysis learning notes
Used in machine learning EDA
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Performance verification of data preprocessing for machine learning (numerical data) (Part 1)
Python: Time Series Analysis: Preprocessing Time Series Data
Automate routine tasks in machine learning
Machine learning algorithm (multiple regression analysis)
Machine learning algorithm (simple regression analysis)
Classification and regression in machine learning
Preprocessing template for data analysis (Python)
Time series analysis 3 Preprocessing of time series data
Random seed research in machine learning
Machine Learning: Supervised --Linear Discriminant Analysis
Basic machine learning procedure: ② Prepare data
How to collect machine learning data
Perform morphological analysis in the machine learning environment launched by GCE
Machine learning
Machine learning imbalanced data sklearn with k-NN
Machine learning with python (2) Simple regression analysis
[python] Frequently used techniques in machine learning
Wind-like dummy data generation in Markov process
<Course> Machine learning Chapter 4: Principal component analysis
Data analysis starting with python (data preprocessing-machine learning)
[Machine learning] Regression analysis using scikit learn
[Updated Ver1.3.1] I made a data preprocessing library DataLiner for machine learning.
Data analysis Titanic 2
Data analysis python
Python 3 Engineer Certification Data Analysis Exam Pre-Exam Learning
Full disclosure of methods used in machine learning
Data analysis Titanic 1
Data analysis in Python: A note about line_profiler
Machine learning Training data division and learning / prediction / verification
[Python3] Let's analyze data using machine learning! (Regression)
Summary of evaluation functions used in machine learning
I tried to process and transform the image and expand the data for machine learning
Analysis of shared space usage by machine learning
[Memo] Machine learning
Get a glimpse of machine learning in Python
Data analysis Titanic 3
Machine learning classification
Predict short-lived works of Weekly Shonen Jump by machine learning (Part 1: Data analysis)
A well-prepared record of data analysis in Python
[For beginners] Introduction to vectorization in machine learning