Python: Preprocessing in Machine Learning: Overview

Data analysis process and preprocessing

Data analysis process and preprocessing in CRISP-DM

First, understand the overall flow of data analysis. The data analysis process includes the following, which are advocated as standard processes.

In CRISP-DM (CRoss Industry Standard Process for Data Mining) advocated by Shearer et al. We advocate the process shown in the following figure.

image.png

In this process

(1) Clarify business issues through business understanding and plan a data analysis project. (2) Understand the current data by acquiring the data by understanding the data and checking whether the data is ready for analysis. (3) In data preparation, format the data into the format required for subsequent modeling. (4) 5. Evaluate the analysis results obtained by modeling, and if sufficient results are obtained, 6. apply the analysis results to the business. As shown in the figure, these processes are not one-way streets, but go back and forth between the previous and next processes as needed.

The pre-processing learned here is

CRISP-In DM, it corresponds to data understanding and data preparation.

KDD is explained in the next section.

Data analysis process and preprocessing in KDD

Compared to CRISP-DM, which considers the entire data analysis project in the business KDD (Knowledge Discovery in Databases) advocated by Fayyad et al. We are more focused on the data analysis part. The diagram below shows the KDD process.

image.png

image.png

Recommended Posts

Python: Preprocessing in Machine Learning: Overview
Python: Preprocessing in machine learning: Data conversion
Preprocessing in machine learning 2 Data acquisition
Preprocessing in machine learning 4 Data conversion
[python] Frequently used techniques in machine learning
[Python] Saving learning results (models) in machine learning
Preprocessing in machine learning 1 Data analysis process
Microservices in Python (Overview)
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
Get a glimpse of machine learning in Python
I started machine learning with Python Data preprocessing
Python: Preprocessing in machine learning: Handling of missing, outlier, and imbalanced data
Japanese preprocessing for machine learning
Machine learning in Delemas (practice)
Python Machine Learning Programming> Keywords
Used in machine learning EDA
Beginning with Python machine learning
Build an interactive environment for machine learning in Python
Tool MALSS (application) that supports machine learning in Python
Coursera Machine Learning Challenges in Python: ex2 (Logistic Regression)
Tool MALSS (basic) that supports machine learning in Python
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Attempt to include machine learning model in python package
MALSS, a tool that supports machine learning in Python
Implement stacking learning in Python [Kaggle]
The result of Java engineers learning machine learning in Python www
Machine learning with python (1) Overall classification
Machine learning summary by Python beginners
Automate routine tasks in machine learning
Widrow-Hoff learning rules implemented in Python
Classification and regression in machine learning
Machine learning
Pre-processing in machine learning 3 Missing values, outliers, and imbalanced data
python learning
Machine learning in Delemas (data acquisition)
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Implemented Perceptron learning rules in Python
Random seed research in machine learning
"Scraping & machine learning with Python" Learning memo
How about Anaconda for building a machine learning environment in Python?
Coursera Machine Learning Challenges in Python: ex5 (Adjustment of Regularization Parameters)
Python & Machine Learning Study Memo: Environment Preparation
Notes on PyQ machine learning python grammar
Amplify images for machine learning with python
Use machine learning APIs A3RT from Python
Machine learning with python (2) Simple regression analysis
I installed Python 3.5.1 to study machine learning
Why Python is chosen for machine learning
"Python Machine Learning Programming" Summary Note (Jupyter)
[Shakyo] Encounter with Python for machine learning
[Python] First data analysis / machine learning (Kaggle)
[Python] When an amateur starts machine learning
[Python] Web application design for machine learning
Python and machine learning environment construction (macOS)
An introduction to Python for machine learning
Python & Machine Learning Study Memo ③: Neural Network
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Python & Machine Learning Study Memo ⑥: Number Recognition
Build AI / machine learning environment with Python
Coursera Machine Learning Challenges in Python: ex6 (How to Adjust SVM Parameters)
Quadtree in Python --2