[PYTHON] I summarized one year of self-taught data science.

1 Introduction

This is an article about university students who are interested in data science and look back on their studies for a year!

Let's get down to the main topic!

2 Purpose of this article

** ① Organize your own learning ** After doing it for about a year, I forget what I've learned so far. This is because I want to organize it once and utilize it for future study. Therefore, it may be difficult to read many subjective expressions. If you have any questions or comments, please comment.

** ② Reference for beginners! ** ** It was also written in the book Shin Nihon, but more people will be in the future. I think I will start learning data science. After reading this book, I wanted to play an active role in this field in the future, so I started learning in earnest. Therefore, I thought that beginners should refer to this article. シンニホン.jpg

3 What I was aware of when studying

** ① To acquire skills that can be proved **

I wanted to get the qualification so that I could easily see the goal and keep my motivation. I wanted to acquire the minimum knowledge by referring to the following, so I obtained the Statistical Test Level 2 and G Test in the first six months. I'm thinking of winning a medal at kaggle in the future.

Explanation of qualifications that data scientists want to acquire, outlines and difficulty levels, and courses where they can learn necessary skills

** ② Learn the theory **

Learning data science is especially easy to black box. I took the time to study, but I was careful not to be completely ununderstood.

** ③ Always be aware of the loader map **

In particular, beginners think that there is a high risk of frustration, so they have created a roadmap and studied in order to calculate backwards and study systematically.

[[2020 version for beginners] Recommended study method for those who want to become an AI engineer by themselves](https://qiita.com/tani_AI_Academy/items/e47bf4d1316b66a0402b#%E6%9C%AC%E6%B0%97% E3% 81% A7ai% E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3% 82% A2% E3% 82% 92% E7% 9B% AE% E6% 8C% 87% E3% 81% 97% E3% 81% A6% E3% 81% 84% E3% 82% 8B% E6% 96% B9% E3% 81% B8) 2020 Edition: Skill requirements for data scientists and machine learning engineers required in the field [Self-study] Learning Roadmap for Machine Learning for Python Beginners

By referring to the above three articles, I learned that there are five major knowledge required for data science.

  1. Programming skills: Basic python grammar and machine learning implementation
  2. Mathematics: Calculus / Statistics / Linear Algebra
  3. Machine learning theory 4.kaggle: For understanding a series of data preprocessing, feature engineering, and analysis flows
  4. Others: Knowledge of SQL / cloud, etc.

I have summarized my knowledge of SQL and cloud in other forms because I haven't started studying yet. I plan to study hard in the future.

In the following, I will write a review article mainly divided into 1 to 4 fields!

4 Looking back

0. The first step

If you are a beginner, we have compiled 3 books that you should start studying from here!

(1) Does artificial intelligence exceed humans?

It may be the most famous book related to machine learning. It is also a recommended book for the G test. What is artificial intelligence? ?? ?? Those who become should read first! Beyond humans.jpg

[② A book that understands mathematics for artificial intelligence programming](https://www.amazon.co.jp/%E4%BA%BA%E5%B7%A5%E7%9F%A5%E8%83%BD % E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0% E3% 81% AE% E3 % 81% 9F% E3% 82% 81% E3% 81% AE% E6% 95% B0% E5% AD% A6% E3% 81% 8C% E3% 82% 8F% E3% 81% 8B% E3% 82 % 8B% E6% 9C% AC-% E7% 9F% B3% E5% B7% 9D-% E8% 81% A1% E5% BD% A6-ebook / dp / B079TLRZ8K / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E4% BA% BA% E5% B7% A5% E7% 9F% A5% E8% 83% BD% E3 % 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0% E3% 81% AE% E3% 81 % 9F% E3% 82% 81% E3% 81% AE% E6% 95% B0% E5% AD% A6% E3% 81% 8C% E3% 82% 8F% E3% 81% 8B% E3% 82% 8B % E6% 9C% AC & qid = 1589274169 & s = digital-text & sr = 1-1)

It is easy to read even for people who are not good at mathematics, and the minimum necessary pages are organized to explain how mathematics is used in artificial intelligence. I didn't know why I was studying for university entrance exams, but I was impressed when I first understood that mathematics was actually used in machine learning.

Even if you are a beginner, if you do not suppress this level of math, you will definitely get stuck in the theory of machine learning, so I recommend you to read it! Mathematics for.jpg

[③ Book for those who want to become a machine learning engineer](https://www.amazon.co.jp/%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92 % E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3% 82% A2% E3% 81% AB% E3% 81% AA% E3% 82% 8A% E3 % 81% 9F% E3% 81% 84% E4% BA% BA% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E6% 9C% AC-AI% E3 % 82% 92% E5% A4% A9% E8% 81% B7% E3% 81% AB% E3% 81% 99% E3% 82% 8B-% E7% 9F% B3% E4% BA% 95-% E5 % A4% A7% E8% BC% 94-ebook / dp / B07GWM4J7H / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords = % E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3 % 82% A2% E3% 81% AB% E3% 81% AA% E3% 82% 8A% E3% 81% 9F% E3% 81% 84% E4% BA% BA% E3% 81% AE% E3% 81 % 9F% E3% 82% 81% E3% 81% AE% E6% 9C% AC & qid = 1589274212 & s = digital-text & sr = 1-1)

If you have read the above two books and would like to study more, please do read them! The necessary elements are written in an easy-to-understand manner! People who want to be a machine learning engineer.jpg

1. Programming

I learned two things. ** python** Minimum basic grammar for handling python (control syntax for if statements, loops, class / method definitions, etc.) ** Implementation of machine learning ** Implemented based on theory using a library called scikit-leran

python

① [Self-study programmer](https://www.amazon.co.jp/%E7%8B%AC%E5%AD%A6%E3%83%97%E3%83%AD%E3%82%B0%E3 % 83% A9% E3% 83% 9E% E3% 83% BC-Python% E8% A8% 80% E8% AA% 9E% E3% 81% AE% E5% 9F% BA% E6% 9C% AC% E3 % 81% 8B% E3% 82% 89% E4% BB% 95% E4% BA% 8B% E3% 81% AE% E3% 82% 84% E3% 82% 8A% E6% 96% B9% E3% 81 % BE% E3% 81% A7-% E3% 82% B3% E3% 83% BC% E3% 83% AA% E3% 83% BC% E3% 83% BB% E3% 82% A2% E3% 83% AB% E3% 82% BD% E3% 83% 95-ebook / dp / B07BKVP9QY / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E7% 8B% AC% E5% AD% A6% E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9E% E3% 83% BC & qid = 1589274239 & s = digital-text & sr = 1-1)

I think it's a good book to learn the basics of programming, but it's too difficult for beginners who can't decide how much to learn. I would like to read it again when I really need python Self-study programmer.jpg ② progate

Speaking of programming beginners, it's progate! For the time being, it is recommended to take two laps to get a rough idea!   progate.png

③ PyQ

Among the various courses, I went through a 82-hour course called the "Python Grammar from Inexperienced" course. It is nice for beginners to be able to study immediately because there is no need to build an environment.

However, while there are many teaching materials that can be learned for free with the basic grammar of python, I felt that it would be a little expensive to spend 3000 yen a month. Also, looking back, it's a pity that there were quite a few grammars that I didn't use much.   pyq.jpg

④ Tommmy blog

The blog of Tommy, an obstetrician and gynecologist, has a collection of Learning Roadmap for Python Beginners [Self-study on the Blog]. I was impressed with this clarity for free! I think it is ideal for beginners to keep this in check for the time being and learn more if necessary.

[Kikagaku style] Algorithm theory learned with Python for improving programming ability (Part 1)

This is a lecture by Mr. Kikagaku from Udemy. Other courses are introduced below, but Kikagaku's courses are all wonderful! Where should I write the code for the program for beginners in the first place? ?? It is a commentary about that.

Algorithm theory.jpg

Implementation of machine learning

① PyQ

In the above, I introduced the "Python grammar from inexperienced" course, but I spent two months learning two courses, the "data analysis" course and the "machine learning" course.

The implementation of machine learning is easy to frustrate because there are many thick reference books, but you can learn this on the Internet. Also, since the explanation was for beginners, it is ideal as a teaching material to learn after grasping the basic grammar of python.

② Kame-san's data science blog

This is the blog of Mr. Kame, who is active as a data scientist in the United States. US Data Scientist Blog

A series of articles on getting started with Python for data science is just too good for beginners!

It's too great to learn basic python grammar, pandas, numpy, and even data visualization on this blog.

[③ Machine learning starting with python](https://www.amazon.co.jp/Python%E3%81%A7%E3%81%AF%E3%81%98%E3%82%81%E3%82 % 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E2% 80% 95scikit-learn% E3% 81% A7% E5% AD% A6% E3% 81% B6% E7% 89% B9% E5% BE% B4% E9% 87% 8F% E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3% 82% A2% E3% 83% AA% E3% 83% B3% E3% 82% B0% E3% 81% A8% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 81% AE% E5% 9F% BA% E7% A4% 8E-Andreas-C-Muller / dp / 4873117984 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82 % AB% E3% 83% 8A & dchild = 1 & keywords = python% E3% 81% A7% E5% A7% 8B% E3% 82% 81% E3% 82% 8B% E6% A9% 9F% E6% A2% B0% E5 % AD% A6% E7% BF% 92 & qid = 1589274453 & sr = 8-1)

It is one of O'Reilly Japan's books that are often recommended for learning by machine learning.

I can see the goodness of this book now, but I think it's honestly boring if you don't fully understand the theory of machine learning introduced below.

Machine learning book starting with python.jpg

2. Mathematics

Understanding mathematics is very important in learning data science. I think the three things that are especially important are differentiation, statistics, and linear algebra.

However, there are many teaching materials that explain machine learning without much use of mathematics, because learning mathematics increases the probability of frustration for beginners. Therefore, the priority of beginners is not so high, and I think it is better to study when necessary.

Since I had already learned differentiation in college entrance exams, I learned linear algebra and statistics after finishing the above programming learning.

linear algebra

[Linear Algebra Campus Seminar](https://www.amazon.co.jp/%E3%82%B9%E3%83%90%E3%83%A9%E3%82%B7%E3%82%AF% E5% AE% 9F% E5% 8A% 9B% E3% 81% 8C% E3% 81% A4% E3% 81% 8F% E3% 81% A8% E8% A9% 95% E5% 88% A4% E3% 81% AE% E7% B7% 9A% E5% BD% A2% E4% BB% A3% E6% 95% B0% E3% 82% AD% E3% 83% A3% E3% 83% B3% E3% 83% 91% E3% 82% B9% E3% 83% BB% E3% 82% BC% E3% 83% 9F% E2% 80% 95% E5% A4% A7% E5% AD% A6% E3% 81% AE% E6% 95% B0% E5% AD% A6% E3% 81% 8C% E3% 81% 93% E3% 82% 93% E3% 81% AA% E3% 81% AB% E5% 88% 86% E3% 81% 8B% E3% 82% 8B-% E5% 8D% 98% E4% BD% 8D% E3% 81% AA% E3% 82% 93% E3% 81% A6% E6% A5% BD% E3% 81 % AB% E5% 8F% 96% E3% 82% 8C% E3% 82% 8B-% E9% A6% AC% E5% A0% B4-% E6% 95% AC% E4% B9% 8B / dp / 4866150874 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E7% B7% 9A% E5% BD% A2% E4% BB % A3% E6% 95% B0% E3% 82% AD% E3% 83% A3% E3% 83% B3% E3% 83% 91% E3% 82% B9% E3% 82% BC% E3% 83% 9F & qid = 1589274514 & sr = 8-1)

It is the best teaching material for college students to take the basics of linear algebra because it is for students to take units of linear algebra.

線形代数.jpg

statistics

Statistical test level 2 study

I passed the exam about half a year after I started studying data science. I think it's best to systematically learn the basics of statistics. I summarized the study method in another article. If you are interested, please see here!

Only this! How to pass the second grade of statistical test in the shortest time

3. Machine learning theory

We prioritized understanding basic machine learning algorithms except deep learning.

[Kikagaku style] Artificial intelligence / machine learning de-black box course

This is a lecture by Mr. Kikagaku from Udemy. I learned two courses, beginner and intermediate.

Based on the review of linear algebra and differentiation, you can learn a series of steps up to optimization by applying it to actual problems such as rent prediction. There are also exercises in python.

This course is for beginners and is recommended because it is highly evaluated by Udemy. Beginner strange.jpg Intermediate bias.jpg

② cousera machine learning

This is a video teaching material that is often taken up as a course for beginners. It was a valuable experience to listen to top-level lectures overseas for free. I didn't use many difficult formulas, and it was easy to understand with a simple explanation. Subtitles can be added in Japanese, but there are many points that require understanding English. Also, it was a pain to have to submit the assignment using a mysterious calculation tool called octave.   machine learning.jpg

③ University of Tsukuba OpenCourseWare

This is the site where the University of Tsukuba publishes class videos. You can watch the lecture on youtube. I watched it because I wanted to learn deeper content than cousera. I felt that the course here was easier to understand because it was in Japanese. Knowledge of mathematics (especially linear algebra and differentiation) is a prerequisite, and I was able to understand the details by deriving the proof of mathematical formulas. However, it was a pity that there was no explanation of the issues. University of Tsukuba.jpg

④ G test

For more information on the G test, click here [https://www.jdla.org/certificate/general/)! I think it is a perfect qualification for beginners to set as their first goal.

Since I understood machine learning with cousera, I understood deep learning with the official textbook and practiced with a collection of problems called Kuromoto and a collection of expected problems that can be received on the web.

[1. Deep Learning G Test (Generalist) Official Text](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF%92 % E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-G% E6% A4% 9C% E5% AE% 9A% EF% BC% 88% E3% 82% B8% E3% 82% A7% E3% 83% 8D% E3% 83% A9% E3% 83% AA% E3% 82% B9% E3% 83% 88% EF% BC% 89-% E5% 85% AC% E5 % BC% 8F% E3% 83% 86% E3% 82% AD% E3% 82% B9% E3% 83% 88-% E6% B5% 85% E5% B7% 9D-ebook / dp / B07H2ZR6M2 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords = g% E6% A4% 9C% E5% AE% 9A & qid = 1589349248 & sr = 8-1 )

Perhaps almost everyone who takes this exam is a reference book to buy. Buy as soon as you decide to take the exam! It's simple, so I often check it even after passing the exam. g検定.jpg

[2. Thorough Strategy Deep Learning G Test Generalist Problem Collection](https://www.amazon.co.jp/%E5%BE%B9%E5%BA%95%E6%94%BB%E7%95%A5 -% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0G% E6% A4% 9C% E5% AE% 9A-% E3% 82% B8% E3% 82% A7% E3% 83% 8D% E3% 83% A9% E3% 83% AA% E3 % 82% B9% E3% 83% 88-% E5% 95% 8F% E9% A1% 8C% E9% 9B% 86-% E5% BE% B9% E5% BA% 95% E6% 94% BB% E7 % 95% A5% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82% BA-ebook / dp / B07NDVCN99 / ref = reads_cwrtbar_1 / 355-6465569-1212101? _Encoding = UTF8 & pd_rd_i = B07NDVCN99 & pd_rd_r = e79bfdee-02fd-4930-9b9b-3fbee6373ba2 & pd_rd_w = ysTuv & pd_rd_wg = 1zuc8 & pf_rd_p = 64c49d12-7012-452e-9a49-e43c9513f9fc & pf_rd_r = YSZZQ7T9XGP3MR6H04W

This book is also called a black book. When I took the exam, I bought it because there was no other reference book that I could do problem exercises. It became one of the criteria for measuring the pass or fail of the exam, but I also felt that there were many differences from the actual exam. g検定s.jpg

3. G test practice test

This is a mock test (past question) of the G test published by "Study-AI" of the artificial intelligence study group.

I felt that this mock test was more similar to the actual test than Kuromoto. 4. kaggle After learning the basics, there were many articles saying that you should actually handle data, so I decided to challenge kaggle. This is a competition site for data scientists, which mainly contains articles written in English and gathers competitions sponsored by overseas companies.

Signate is famous in Japan. However, at the moment, kaggle is more famous and the code and solution are open to the public, so I feel that many beginners start with kaggle.

There are still few articles to tackle kaggle Summary of recommended materials for complete beginners to finish Kaggle's "Introduction" at high speed (December 2019 version) I especially referred to it. If you follow the instructions here, you'll be fine with getting started with kaggle!

There are more updated articles now! Summary of recommended materials for machine learning beginners to finish Kaggle's "Introduction" at high speed (March 2020 version)

① kaggle tutorial

For beginners, you can learn a series of steps from submitting analysis results using the kaggle tutorial titanic competition.   kaggle tyu.jpg

② What to do next after registering with Kaggle-If you do this, you can fight enough! Getting Started with Titanic 10 Kernel ~

There was a range to cover with the kaggle tutorial, but both were correct. I enjoyed learning because I also handle some text data and image data.

[③ Python practice data analysis 100 knocks](https://www.amazon.co.jp/Python%E5%AE%9F%E8%B7%B5%E3%83%87%E3%83%BC%E3 % 82% BF% E5% 88% 86% E6% 9E% 90100% E6% 9C% AC% E3% 83% 8E% E3% 83% 83% E3% 82% AF-% E4% B8% 8B% E5% B1% B1% E8% BC% 9D% E6% 98% 8C-ebook / dp / B07ZSGSN9S / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83 % 8A & dchild = 1 & keywords = Python% E5% AE% 9F% E8% B7% B5% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90100% E6 % 9C% AC% E3% 83% 8E% E3% 83% 83% E3% 82% AF & qid = 1589274688 & sr = 8-1)

If you do the above two things, you can get a feel for kaggle to some extent. However, the most time-consuming part of the data analysis process is data preprocessing. I've learned a cool theory from machine learning theory, but if preprocessing isn't possible, that cool algorithm will be wasted. It doesn't support kaggle, but I think this book is very valuable because there weren't many books that deal with preprocessing mainly in table data in python. python data analysis knock.jpg

[④ Data analysis technology that wins with kaggle](https://www.amazon.co.jp/Kaggle%E3%81%A7%E5%8B%9D%E3%81%A4%E3%83%87%E3 % 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90% E3% 81% AE% E6% 8A% 80% E8% A1% 93-% E9% 96% 80% E8% 84% 87-% E5% A4% A7% E8% BC% 94-ebook / dp / B07YTDBC3Z / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords = kaggle% E3% 81% A7% E5% 8B% 9D% E3% 81% A4% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90% E3% 81% AE% E6% 8A% 80% E8% A1% 93 & qid = 1589274722 & sr = 8-1)

I bought a book dealing with kaggle because it was the only book at that time when I was about to enter the competition. I think it's too difficult for beginners who have only done the titanic competition, but I am working on the competition with the determination that I can only improve by doing it.

kaggle 出勝.jpg

⑤ Participate in the actual competition

Many people said that actually participating in the competition would be the best learning. I'm still lacking in knowledge, but I would like to actively participate in the competition and win medals.

5 Summary

good point

** 1. Keeping motivation high and continuing to study ** For the first two months, I had a lot of things I didn't understand, but as I got used to it, I found it fun to study. I also considered going to a programming school, but I'm self-taught because I don't think it makes sense to go to a programming school so that I won't be frustrated.

** 2. I was able to plan and study flexibly without making a solid schedule from the beginning ** As I felt while writing this article, new reference books are at a fairly fast pace in the field of data science. Also, there is so much to learn, so I don't think it will go as planned at first. Therefore, I was conscious of the big picture on the roadmap and planned and learned the month at the beginning of the month.

Reflections

** 1. I tried to perfect each one too much ** In particular, I spent too much time on the basic grammar of python. The disadvantage of trying to be too perfect ① It takes time to forget useless knowledge anyway ② The frustration rate increases And so on.

** 2. The theory and practice are out of balance ** For the time being, I felt that I could enjoy studying more by inputting the theory and then outputting it. Beginners tend to have too much input, so I want to be careful.

A little impression

** 1. Difficult to continue ** Especially in the first six months, I often studied for about 10 hours a day, but my biggest impression is that I haven't improved as much as I expected. I felt that it would be impossible to continue learning for a long time with motivations such as "AI is the coming era!" And "deee learning seems to be cool." I think that even a book for those who want to become a machine learning engineer should proceed with learning based on the fact that it is not a field where results will come out immediately, as it is said that you should study for a long period of 1 to 3 years. think. ..

** 2. You should use great cheap materials! ** ** All the blogs introduced above are free, and there are many contents that can be used as reference for learning in Qiita. Udemy can buy high quality courses for 1,500 yen at the time of sale. For those who are considering attending a programming school, the AI course and data science course of the programming school are quite expensive, so I felt that it was appropriate to decide whether to attend after learning the basics.

6 Future aspirations

  1. Earn medals with kaggle
  2. Output what you have learned so far

In the future, I would like to do my best with these two as the main. Thank you for watching until the end!

Recommended Posts

I summarized one year of self-taught data science.
I summarized 11 types of operating systems
I summarized the folder structure of Flask
I took Udemy's "Practical Python Data Science"
Library for "I want to do that" of data science on Jupyter Notebook
I want to get League of Legends data ③
I want to get League of Legends data ②
I want to get League of Legends data ①
[Data science memorandum] Handling of missing values ​​[python]
Simple implementation example of one kind of data augmentation
Learn data science
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I tried using the API of the salmon data project
[Basics of data science] Collecting data from RSS with python