[PYTHON] An introductory reader of machine learning theory for IT engineers tried Kaggle

What is Kaggle

Wikipedia

Kaggle is a predictive modeling and analytical method related platform and its operating company where companies and researchers post data and statisticians and data analysts around the world compete for the optimal model.

Roughly speaking, the data scientist version of TopCoder

Background

Today's engineer's taste is [Introduction to Machine Learning Theory for IT Engineers](https://www.amazon.co.jp/IT%E3%82%A8%E3%83%B3%E3%82%B8% E3% 83% 8B% E3% 82% A2% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E7% 90% 86% E8% AB% 96% E5% 85% A5% E9% 96% 80-% E4% B8% AD% E4% BA% 95-% E6% 82% A6% E5% 8F% B8 / dp / 4774176982) I read through, but I don't have a chance to use machine learning at work because I'm not working in data analysis. You can't learn machine learning without actually moving your hands, but it seems to be difficult to move your hands because of the preparation of datasets, etc., and I don't know what it is appropriate to study as teaching materials (MNIST?). .. As a modern engineer, I would like to be able to utilize online learning and the community properly, and also gain knowledge of them. I liked the Kaggle competitions from Dataquest, which I tried as a tutorial.

Goal in this article

Participate in Kaggle's Competition Titanic: Machine Learning from Disaster and win your place in the competition.

Getting Start

Premise

I have learned logistic regression and random forest by machine learning Apply some python code python runtime environment

What to have

Environment where you can write jupyter notebook English skills that are not so reluctant to read English technical sites

Things you don't need this time

GPU environment Knowledge of deep learning

Dataquest Kaggle competitions tutorial

An online learning site that comes with Google at kaggle competition titanic.

Free Kaggle Tutorial - Getting Started with the Titanic Dataset

If you have a google or facebook account, you can start the tutorial as soon as you log in.

Based on the survival data of Titanic passengers, we created a parametric model that calculates the survival probability for passengers in the test data, and competed for the performance of the model. The theme is Kaggle's competition, and the data of the basic process of machine learning. From pre-processing, model creation, training dataset training to test dataset prediction, we will proceed with the Tutorial while solving the problem of writing python code at key points.

data_quest_titanic_python.JPG

This screen is an example of the screen of the Dataquest tutorial. An example appears on the left, write the python code on the right screen, press the execute button, and if the code result is correct, good work will appear and proceed to the next screen.

The Tutorial even supports the generation of a file that submits the prediction results of the test dataset of the model that was actually created to kaggle. Dataquest has two Kaggle Competition courses, the first of which is a simple logistic regression model to experience the flow, and the next Improving Your Submission course is an ensemble learning model of accuracy. It is a course that focuses on aiming for a score by raising.

If you have the knowledge written in the premise, I think that each course can be completed in 2 hours.

After learning the Tutorial, organize what you've learned in your Python execution environment, and ... (although you just copy and paste the code you wrote in the Tutorial), and then submit the file to Kaggle. I tried to generate the code here.

Submit this to Kaggle!

kaggle_titanic_complete.JPG

It was 1231th in the 7071 team!

Summary

You can experience it in a short time without worrying about it, so I thought it would be a good material for self-study and study sessions after reading the introductory book on machine learning. This is a starting line, and even if you devise a model yourself, you may study in the direction of raising the score a little more. I also participate in other competitions. In my case, I'm also studying deep learning, so I refer to various sites for competiion of dogs vs cats. I tried to continue. For the time being, the score was 0.14160, and the score of about 697th / 1314 teams came out, but it seems that the competition is already closed, and if it is closed, it seems that it will not be registered in Rank, so it's a bit disappointing. did.

Digression

The author of Introductory Machine Learning Theory for IT Engineers was the same age as me and a graduate of the same Faculty of Science, Department of Physics (although at a different university). I would like to learn the weights of neurons by back-propagating the neural network of my brain with the author as teacher data to find out where and where there is such a difference. .. ..

Recommended Posts

An introductory reader of machine learning theory for IT engineers tried Kaggle
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2
[Machine learning] I tried to summarize the theory of Adaboost
An introduction to OpenCV for machine learning
2020 Recommended 20 selections of introductory machine learning books
An introduction to Python for machine learning
An introduction to machine learning for bot developers
Machine learning of sports-Analysis of J-League as an example-②
I tried to build an environment for machine learning with Python (Mac OS X)
Build an interactive environment for machine learning in Python
[Recommended tagging for machine learning # 2] Extension of scraping script
[Recommended tagging for machine learning # 2.5] Modification of scraping script
I tried to compare the accuracy of machine learning models using kaggle as a theme.
Build an environment for machine learning using Python on MacOSX
[Recommended tagging for machine learning # 1] Scraping of Hatena blog articles
The result of Java engineers learning machine learning in Python www
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
I tried using Tensorboard, a visualization tool for machine learning
Performance verification of data preprocessing for machine learning (numerical data) (Part 1)
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
Data set for machine learning
Japanese preprocessing for machine learning
Basics of Machine Learning (Notes)
Machine learning beginners tried RBM
Importance of machine learning datasets
Try machine learning with Kaggle
[Python machine learning] Recommendation of using Spyder for beginners (as of August 2020)
Summary of recommended APIs for artificial intelligence, machine learning, and AI
How to use machine learning for work? 01_ Understand the purpose of machine learning
[Machine learning] I tried to do something like passing an image
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~