[PYTHON] Study method for learning machine learning from scratch (March 2020 version)

Introduction

Data science and machine learning look interesting! Have you ever felt that you don't know how to study? By the way, I am one of them. In this article, *** What is machine learning in the first place? Even beginners who know the word AI but do not know the details can learn how to study *** from the basics necessary to gain knowledge and experience and be able to work on machine learning. I will introduce it based on my experience! (I think that the ones introduced here can be used as a review of basic knowledge even for those who are intermediate or above in machine learning.)

Overview (basically these 3 steps)

  1. Acquire basic knowledge (understanding of words and terms)
  2. Understand how to use the library
  3. Actually challenge the competition ([Kaggle](What is #kaggle))

1. A textbook that clearly understands the mechanism and technology of machine learning & deep learning in this one book

image.png

[amazon link](https://www.amazon.co.jp/%E5%9B%B3%E8%A7%A3%E5%8D%B3%E6%88%A6%E5%8A%9B-%E6 % A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E3% 81% 97% E3% 81% 8F% E3% 81% BF% E3% 81% A8% E6% 8A% 80% E8% A1% 93% E3% 81% 8C% E3% 81% 93% E3% 82% 8C1% E5% 86% 8A% E3% 81% A7% E3% 81% 97% E3% 81% A3% E3% 81% 8B% E3% 82% 8A% E3% 82% 8F% E3% 81% 8B% E3% 82% 8B% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E6% A0% AA% E5% BC% 8F% E4% BC% 9A% E7% A4% BE% E3% 82% A2% E3% 82% A4% E3% 83 % 87% E3% 83% 9F% E3% 83% BC / dp / 429710640X)

In the field of learning for the first time, the first problem that comes up is "I don't understand Japanese" (I don't understand technical terms). I think this is common to all disciplines, but I think it's especially true when it comes to machine learning. This book is recommended to overcome such a situation, and it is exactly a "textbook" that teaches you from the basics.

Brief content introduction

Chapter 1: Basic knowledge of artificial intelligence (A good basic knowledge such as AI, machine learning, deep learning, historical background, etc. is summarized at the beginning) Chapters 2-4: About Machine Learning (Explains from basic knowledge that often appears when looking at machine learning literature to algorithms) Chapters 5-7: About Deep Learning (Explanation of processes and algorithms from basic knowledge of deep learning) Chapter 8: System environment and development environment (Explanation of machine learning libraries and frameworks, deep learning frameworks from the point of program language selection)

Recommended reading

  1. First read (read through the words with the feeling of skipping them even if you don't understand them) → If you just do this, when you look at other articles and literature, it will start with words you have seen, not words you do not understand, so it will be easier to learn!

  2. Reread the part where the basic knowledge is written + the final chapter that understands the flow of development (Chapter 1, Chapter 2, Chapter 5, Chapter 8 in terms of chapters)

  3. After reading to some extent, proceed to the next step and use this book as a reference book to draw when you come across a word you do not understand.

2. Introduction to Python for Data Science

Kame @ US Data Scientist's blog Screenshot from Gyazo Kame-san's blog itself has a lot of things to study, so it's a good idea to find some time to read it, but this course is especially recommended. It is very easy to understand because the library that can be said to be used whenever doing machine learning is systematically organized. (I think the Kaggle Startbook, which I will introduce later, will be better understood after reading this article.)

Brief content introduction

(For this content introduction, the part of "Purpose of this course" written on the blog was written in an easy-to-understand manner, so I will quote from that part) The purpose of this course is to master the environment construction necessary for data science in Python, the basics of Python, the basics of Python libraries used in data science, and the'basic' usage of Python modules that frequently appear in data science. is.

What is the goal of this course?

Master the basics of using tools, libraries, and modules to process the data needed for data science in Python. Data can be processed without using spreadsheet tools such as Excel Can process data files such as image files You can automate daily data processing (Excel, etc.) with Python It is a place like that. I also mention some statistics, but please note that it is not a "learning course on data science" but a "learning course on Python for data science".

However, since the course includes abundant "techniques that can be used in the field" and "frequent techniques of data science", you can learn data science in a broad sense.

Anyway, I intend to write it in an easy-to-understand manner. I don't see many difficult words, and I'm explaining it in a very chewy manner, so I don't think it will stop in the middle.

Also, instead of teaching as a textbook, "how to actually use it in the field" is included everywhere. Therefore, it is "content that can be used in actual battles in a somewhat systematic and comprehensive manner". (Quoted so far)

What is being introduced is --The basics of python --NumPy (used for numerical calculation) --Pandas (for manipulating and analyzing data (spreadsheets like Excel can be done faster)) --matplotlib (for drawing graphs) --Seaborn (similar to matplotlib, you can draw graphs, but it's cleaner and easier to draw) --Other useful libraries, modules, etc.

Recommended reading

  1. First read (Iron rule!)

  2. You can do this at the same time as 1, but move your hands to learn the execution results and movements. (I think that there are many things that you can't learn just by reading) → The [Kaggle](what is #kaggle) competition called titanic, which is the subject of this article, will also be used as a tutorial for the Kaggle Startbook, which will be introduced later, so I think that understanding can be further improved if it can be executed together.

  3. Read repeatedly + read back from time to time

3. Practical Data Science Series Beginning with Python Kaggle Startbook (KS Information Science Specialized Book)

image.png [amazon link] (https://www.amazon.co.jp/%E5%AE%9F%E8%B7%B5Data-Science%E3%82%B7%E3%83%AA%E3%83%BC%E3%82%BA-Python%E3%81%A7%E3%81%AF%E3%81%98%E3%82%81%E3%82%8BKaggle%E3%82%B9%E3%82%BF%E3%83%BC%E3%83%88%E3%83%96%E3%83%83%E3%82%AF-KS%E6%83%85%E5%A0%B1%E7%A7%91%E5%AD%A6%E5%B0%82%E9%96%80%E6%9B%B8-%E7%A5%A5%E5%A4%AA%E9%83%8E/dp/4065190061)

[What to do next after registering with Kaggle ~ You can fight enough if you do this! Two authors of Introduction to Titanic 10 Kernel ~ and kaggle Tutorial Is a tutorial book of [Kaggle](what is #kaggle) that I wrote in a tag (the two underlying books are both popular and easy-to-understand books (articles)!)

In the two steps introduced so far, there are some parts that are a little less practical, but I think that there are many things that can be gained by learning while actually using them. However, even so, the real intention is that I don't know where to start. Under such circumstances, participating in the "Titanic" competition, which is a tutorial for beginners of [Kaggle](what is #kaggle), along with this book is a good first step!

Brief content introduction

Chapter 1: What is Kaggle? Explains from to how to create an account (ideal for introduction) Chapter 2: Titanic Tutorial Chapter 3: Explanation of how to handle multiple tables and image / text data (It has also been introduced to competitions other than the titanic competition that I tried in Chapter 2) Chapter 4: Page with tips for learning more (I just did the tutorial and it doesn't end. There are contents that lead to the future)

Recommended point ➀: Sample code

Since the sample code is up, if you follow it, you need to register a Kaggle account, but you can also practice the whole process with almost non-coding. (I don't know the details, but if you want to get an overview, you may want to try it once and interpret the code.)

Recommended points ➁: Dialogue article

This page is a dialogue between the two authors, and I read not only the parts and perspectives that advanced users take for granted, but also the reasons for starting Kaggle and the good things, and there are many useful things to study. It will also be.

Recommended points ➂: note

It is easy to understand because it summarizes Kaggle's peripheral knowledge and + α knowledge in a column format. (It may come out if you look it up yourself, but it is easy to understand, so it is appreciated for beginners)

Recommended reading

(Since there are some parts that I haven't done yet, this is my current study plan.)

  1. Read Chapter 1 to get an overview and image of Kaggle
  2. Try practice titanic while reading Chapter 2
  3. Read Chapter 3 (Read the contents of multiple tables, images, and text competitions, and read them again when practicing the subject)
  4. Try to actually participate in the competition. There is a page (Chapter 4) that introduces how to choose the competition to participate in next to Titanic and how to fight recommended for beginners, so I think it is good to participate according to it!

I tried to organize Kaggle's Competition Categories ← The Kaggle competition categories are written in an easy-to-understand manner!

What is Kaggle

Kaggle is the world's largest data science community with powerful tools and resources to help you reach your data science goals. Screenshot from Gyazo ↑ Because there was a page in Mr. Harada's material of DeNA at Devsumi 2018 summer that it was easy to intuitively understand what Kaggle is. , Quoted.

4. Summary

I myself haven't had a long history of machine learning, and I think I'm a beginner, but thanks to the environment and teaching materials, I'm gradually getting to know it. So, in this article, I hope that by summarizing the basic learning methods, it will be useful for those who are interested in machine learning but do not know how to study by themselves. As I study from now on, I think that it is better to do this, and that it may be easier for other people (advanced and intermediate) to study. I hope I can update it from time to time or put it together in a separate article!

By the way, I think that the continuation of the 3 steps written in this article (although it may be the same as the practice of the 3 competition) can further improve the technique if you can refer to the following articles etc. Summary of recommended materials for machine learning beginners to finish Kaggle's "introduction" at high speed Mr. Murata, one of the authors of Kaggle Start Book (Curry) Chan)'s article.

Other reference materials

regonn & curry podcast

Recommended (A podcast that talks about Kaggle themes and new competitions every week)

[Use of Japanese Machine Learning Learned with Kaggle (Ogoshi)](https://www.slideshare.net/ssuserafaae8/for-manabiya?ref=http://kaggler-ja-wiki.herokuapp.com/kaggle%E5% 88% 9D% E5% BF% 83% E8% 80% 85% E3% 82% AC% E3% 82% A4% E3% 83% 89)

kaggler-ja slack kaggler's Japanese community Slack

9 steps to become a machine learning expert in the shortest time [Completely free]

Recommended Posts

Study method for learning machine learning from scratch (March 2020 version)
Newton's method for machine learning (from one variable to multiple variables)
"Usable" one-hot Encoding method for machine learning
Recommended study order for machine learning / deep learning beginners
Machine learning starting from 0 for theoretical physics students # 1
Machine learning starting from scratch (machine learning learned with Kaggle)
Machine learning starting from 0 for theoretical physics students # 2
Deep Learning from scratch
Non-information graduate student studied machine learning from scratch # 1: Perceptron
Python learning memo for machine learning by Chainer from Chapter 2
Create a machine learning environment from scratch with Winsows 10
Data set for machine learning
Japanese preprocessing for machine learning
Deep Learning from scratch 1-3 chapters
Machine Learning Study Resource Notepad
Rebuilding an environment for machine learning with Miniconda (Windows version)
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Create an environment for "Deep Learning from scratch" with Docker
Non-information graduate student studied machine learning from scratch # 2: Neural network
Build a PyData environment for a machine learning study session (January 2017)
[Deep Learning from scratch] Main parameter update methods for neural networks
[Machine learning] Understanding uncorrelatedness from mathematics
Lua version Deep Learning from scratch Part 6 [Neural network inference processing]
Deep learning from scratch (cost calculation)
Learning method output for LPIC acquisition
Machine learning meeting information for HRTech
Machine learning algorithm (gradient descent method)
Deep Learning memos made from scratch
[Recommended tagging for machine learning # 4] Machine learning script ...?
How to study for the Deep Learning Association G test (for beginners) [2020 version]
Non-information graduate students studied machine learning from scratch # 3: MNIST Handwritten digit recognition
Realize environment construction for "Deep Learning from scratch" with docker and Vagrant
Python & Machine Learning Study Memo: Environment Preparation
[Learning memo] Deep Learning made from scratch [Chapter 7]
Amplify images for machine learning with python
Deep learning from scratch (forward propagation edition)
First Steps for Machine Learning (AI) Beginners
Use machine learning APIs A3RT from Python
I installed Python 3.5.1 to study machine learning
Deep learning / Deep learning from scratch 2-Try moving GRU
An introduction to OpenCV for machine learning
Why Python is chosen for machine learning
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 6]
[Shakyo] Encounter with Python for machine learning
[Python] Web application design for machine learning
"Deep Learning from scratch" in Haskell (unfinished)
Deep learning / Deep learning made from scratch Chapter 7 Memo
An introduction to Python for machine learning
Python & Machine Learning Study Memo ③: Neural Network
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Creating a development environment for machine learning
Python & Machine Learning Study Memo ⑥: Number Recognition
[Deep Learning from scratch] About hyperparameter optimization
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Prepare the environment for O'Reilly's book "Deep Learning from scratch" with apt-get (Debian 8)
Machine learning
Lua version Deep Learning from scratch Part 5.5 [Making pkl files available in Lua Torch]