[PYTHON] Introducing books related to data science.

I started studying statistics when I was in the third year of university, and this year is my fourth year.

Since it's a big deal, I thought it would be useful for someone if I could introduce the book while looking back on what kind of study I had done. I think that there are surprisingly few introductions about how many books you read at what level and how you felt.

Motivation for this article

In the last few years, good books on statistics and machine learning have appeared in Japanese, and it seems that many communities have been born. When I was job hunting, when I asked Google Sensei about "statistics work," I didn't get much decent information. I remember having a hard time just knowing that there seems to be a job as a data scientist.

That was three years ago, so you can feel how rapidly it has spread. When I was job hunting, I remember hearing the word "data scientist" even though I didn't go to hear it for that purpose.

Talking to job hunters, it seems that this momentum has not diminished even now, and companies are still looking for people who are strong in data analysis. It seems that there are many good books written in Japanese, partly because they are so popular in society.

On the other hand, I think there is also the aspect that it is difficult to choose which book to read because there are too many good books.

So, this time, I would like to introduce the books I read from time to time while looking back on the trajectory of my studies in statistics. The image of the book is a link to Amazon, so I think that the reviews written there will also be helpful. Here, I would like to write something that is not written in the Amazon review as much as possible.

Why did you start studying statistics in the first place?

The reason for starting studying is surprisingly interesting for each person.

In my case, when I simply studied statistics in class, I thought it was "interesting".

What was interesting was ** test **. For a long time, I have been wondering, "When I measure something and compare it, how much is the same and where is it different?" For example, when you measure your height and talk about being bigger and smaller than your friends, is the difference of 0.1mm the same? Then what about 1mm? It's said that the 1mm difference is the same, but isn't the 1.1mm difference the same? It's a question that people might hate.

I think it's okay to leave the feelings of each other to the extent that they live normally, but what should I do if I want to evaluate objectively? I've been thinking for a long time. I didn't know what to look for in Guguru, and people seemed to hate me, so I didn't ask, and then suddenly appeared in class. I remember being a little impressed.

So, I started studying statistics because I wanted to know what kind of reason the difference was evaluated.

Failure at first

Since I'm from the Department of Management Systems Engineering, I've touched on statistics to some extent in class, such as quality control. In terms of level, I felt that I could understand about the second grade of the statistical test.

Starting from that level, I wasn't sure what to study, so the first thing I started was studying in the range of Statistical Test Level 1. The reason is simple, it has the widest range, and the words like that are lined up, and it seems that comprehensive knowledge can be obtained.

To be honest, I think this was a failure in terms of improving my ability.

Despite its wide range, there is no textbook for this, so I had to read one textbook to study one field. It is a task to skip the proof and take in the outline. While it was difficult to secure time due to the mountain of experiments and tasks at school, I tried to comprehensively grasp the whole while the moyamoya was about to explode.

Because I was doing such a thing, I could only acquire knowledge as shallow as a puddle ...

I made a mistake, so I thought I had to understand it properly, so I decided to start studying so that I could steadily gain my abilities.

First from here (1st year)

Although it failed, there was a harvest. First of all, it seems that you will not even be able to use it unless you put your feet on the ground and study. Also, as a major premise, it turned out that statistics requires knowledge of probability distributions. And I was overwhelmingly lacking it. That's why I started this book.

1. Probability and statistics of university students

[Probability statistics for college students.jpg](https://www.amazon.co.jp/%E5%BC%B1%E7%82%B9%E5%85%8B%E6%9C%8D%E5%A4%A7%E5%AD%A6% E7% 94% 9F% E3% 81% AE% E7% A2% BA% E7% 8E% 87% E3% 83% BB% E7% B5% B1% E8% A8% 88-% E8% 97% A4% E7 % 94% B0-% E5% B2% B3% E5% BD% A6 / dp / 4489020694)

It was a great answer to choose this book.

The nice thing about this book is that you can acquire knowledge through exercises while studying for entrance exams.

In addition to knowledge of basic probability distributions, you can also learn how to use the central limit theorem and moment generating function, change of variables in multidimensional distributions, and calculate conditional expected values. You can also practice the basics of stochastic processes such as Markov chains and Poisson processes.

You can learn these contents through hand-moving calculations. Unless you go to an esoteric book, I think you can get enough knowledge without studying probability for a while. I am very grateful to be able to acquire basic calculation techniques that are commonly used in statistical reference books and that are too common and broken.

By the way, the last chapter is actuarial science, but I think you don't have to do it here.

2. Statistics of natural science

[Statistics of Natural Sciences.jpg ](Https://www.amazon.co.jp/%E8%87%AA%E7%84%B6%E7%A7%91%E5%AD%A6%E3%81%AE%E7%B5%B1 % E8% A8% 88% E5% AD% A6-% E5% 9F% BA% E7% A4% 8E% E7% B5% B1% E8% A8% 88% E5% AD% A6-% E6% 9D% B1 % E4% BA% AC% E5% A4% A7% E5% AD% A6% E6% 95% 99% E9% A4% 8A% E5% AD% A6% E9% 83% A8% E7% B5% B1% E8 % A8% 88% E5% AD% A6% E6% 95% 99% E5% AE% A4 / dp / 4130420674)

Now that I have the knowledge of basic probabilities, I chose this book to study statistics.

It is a so-called blue book, but many people ** [red book](https://www.amazon.co.jp/%E7%B5%B1%E8%A8%88%E5%AD%A6% E5% 85% A5% E9% 96% 80-% E5% 9F% BA% E7% A4% 8E% E7% B5% B1% E8% A8% 88% E5% AD% A6% E2% 85% A0-% E6% 9D% B1% E4% BA% AC% E5% A4% A7% E5% AD% A6% E6% 95% 99% E9% A4% 8A% E5% AD% A6% E9% 83% A8% E7% B5% B1% E8% A8% 88% E5% AD% A6% E6% 95% 99% E5% AE% A4 / dp / 4130420658 / ref = asc_df_4130420658 /? = 5071693128671812832 & hvpone = & hvptwo = & hvqmt = & hvdev = c & hvdvcmdl = & hvlocint = / hvlocphy = 100009255 & hvtargid = pla-524368023110 & psc = 1 & th = 1 & psc = 1) ** When I was wondering whether to do a red book, I thought that I would not suffer much from the book with the probability I raised earlier and the content I did in class, so I skipped it, but it looks like a good book, so it is good to do this I think.

Of course, choosing Aomoto was also the correct answer. First of all, it was good that I had some knowledge of linear models. In the introductory book of machine learning, the distribution of errors somehow does not go into the story of the test, so it was valuable to be able to study here.

Also, although the level of the test story is not too high, it was written relatively well, so it was a great learning experience. I like the flow from the maximum likelihood method to the test. If you have a question such as "There is a XX test, why is that test good?", It may be disappointing.

By the way, while studying this book, I realized the importance of knowledge of linear algebra, so I also reviewed linear algebra.

Kernel multivariate analysis

Kernel multivariate analysis.jpg

I heard that SVM is amazing, so when I asked my teacher, I recommended this book.

At that time, I didn't have the concept of space in mathematics in the first place, so I was confused by the various spaces such as feature space, sample space, and reproducing kernel Hilbert space. I remember doing it. Also, in high school and university first year mathematics, I think that I often learn properties that generally hold, but I think that it is aimed at things with such properties, such rudimentary things. I remember that I often stumbled.

I think it's a little tough if you don't have a certain level of math, but I think the content of the book is fairly easy to understand. Reading this book at this stage is not recommended at this stage, but it is recommended for anyone who wants to learn about the kernel method someday.

Multivariate statistical analysis method

[Multivariate statistical analysis method.jpg [Https://www.amazon.co.jp/%E5%A4%9A%E5%A4%89%E9%87%8F%E7%B5%B1%E8%A8%88%E8%A7%A3 % E6% 9E% 90% E6% B3% 95-% E7% 94% B0% E4% B8% AD-% E8% B1% 8A / dp / 476870154X / ref = sr_1_1? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E5% A4% 9A% E5% A4% 89% E9% 87% 8F% E7% B5% B1% E8% A8% 88 % E8% A7% A3% E6% 9E% 90% E6% B3% 95 & qid = 1605949253 & sr = 8-1)

It's a fairly old book (1983 book), but I like it a lot. You can learn about basic multivariate analysis methods (regression analysis, principal component analysis, discriminant analysis, quantification method, factor analysis, graph analysis method, cluster analysis). The introduction started from the story of motivation that you want to know using concrete data examples and this data, instead of the usual flow of explaining the theory as a whole and then concrete examples, and then the theory and concrete data. Since the calculations in the above run in parallel, you can study with the same feeling as when you are actually analyzing the data.

Looking at it now, I think that the talk and calculations are too specific, but I was very grateful for the specificity at that time.

Multivariate analysis is also available in "[Actual data analysis](https://www.amazon.co.jp/%E3%83%87%E3%83%BC%E3%82%BF%E8%A7%" A3% E6% 9E% 90% E3% 81% AE% E5% AE% 9F% E9% 9A% 9B% E2% 80% 95% E5% A4% 9A% E6% AC% A1% E5% 85% 83% E5% B0% BA% E5% BA% A6% E6% B3% 95% E3% 83% BB% E5% 9B% A0% E5% AD% 90% E5% 88% 86% E6% 9E% 90% E3% 83% BB% E5% 9B% 9E% E5% B8% B0% E5% 88% 86% E6% 9E% 90-% E5% A5% A5-% E5% 96% 9C% E6% AD% A3 / dp / 4863451814 / ref = sr_1_2? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E3% 83% 87% E3% 83% BC% E3% 82% BF% E8% A7% A3% E6% 9E% 90% E3% 81% AE% E5% AE% 9F% E9% 9A% 9B & qid = 1605949876 & sr = 8-2) ”, but this is also the first time Recommended for scholars. I don't know much about books dealing with multidimensional scaling, so I think it's valuable.

1st year summary

In the first year, I made a mistake and reflected on it, focusing on basic probability knowledge, statistical ideas, and studying usable methods such as multivariate analysis. I've heard the term measure theory, but I think it was the correct answer that I didn't touch it because I thought it was impossible at the level.

Also, I realized that the knowledge of linear algebra and calculus that I learned in my first year of college was important, and it was good that I was able to review it without having to wear it sideways. I think that the important thing in studying is to continue, so I'm glad that I was able to acquire the physical strength to continue studying in that sense as it has been alive since then.

Introduction to machine learning (2nd year)

In my second year, I was in the fourth year of university and was assigned to the Applied Statistics Laboratory. I didn't plan to go on to graduate school for financial reasons, so I thought it was studying that would continue to be useful even after I got a job. So when I decided on a laboratory, I asked the teacher, "I want to study more than research, but is that okay?" And he was happy to accept it. (After doing the research, I found that this idea was wrong and that what I gained from the research would continue to be useful, but I am grateful to the teacher who dared to agree without rushing into it.)

I heard that AI seems to be called machine learning, so I decided to study machine learning first.

Basics of statistical learning

[Basics of Statistical Learning.jpg ](Https://www.amazon.co.jp/%E7%B5%B1%E8%A8%88%E7%9A%84%E5%AD%A6%E7%BF%92%E3%81%AE % E5% 9F% BA% E7% A4% 8E-% E2% 80% 95% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 83% 9E% E3% 82% A4% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 83% BB% E6% 8E% A8% E8% AB% 96% E3% 83% BB% E4% BA% 88% E6% B8% AC% E2% 80% 95-Trevor-Hastie / dp / 432012362X)

It was this book that was introduced after asking the teacher for recommendations for studying machine learning.

This is a very favorite book because it comprehensively deals with the basic methods of machine learning and explains the theoretical background in detail with statistical ideas. I still use it often as a dictionary.

As it is thick, the connections and comparisons of each method are detailed. Personally, I like the idea from bagging to random forest, and the evolution from piecewise polynomials to splines.

There are few books that cover unsupervised learning and regularization widely, so it is also recommended to cover them. Also, I only know this book, which is a book that properly describes boosting stories that are popular in kaggle etc. from the basics, so that point is also recommended. (Since I wrote an article that just filled the space between lines for boosting, I will post only ** link **.)

It was very heavy (in many ways), especially it was very difficult to fill the line spacing of the formula transformation, and when I tried to fill the line spacing and brought it to the teacher, "The line spacing is filled, but I do not understand. I have a lot of memories when I get back to work.

There are so many references, and they are introduced with the explanation that they are such references, so I also read those that I did not understand. Only good treatises are introduced here, so it was a good practice to read the treatises before entering the graduate school.

Concept of probabilistic model

Basics of Probability Model.jpg

I didn't read the whole thing, I read only the part of the stochastic process.

I think it was a relatively easy-to-understand book, but I didn't have a chance to use stochastic processes after that, so I don't remember much about it. .. .. I remember that I had the impression that I was able to understand the basic story of the stochastic process.

Robust Statistics: How to Deal with Outliers

![How to deal with outliers in robust statistics.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/296113/49d3b28e-075f-6771-3025- f76f172969d1.jpeg)

The theme of my graduation research was robust statistics, so I first studied in this book. Rather, I think that there is probably only this book as a Japanese book on robust statistics. (The other thing I know is "[Stubborn regression estimation (Minodani)]" (https://www.amazon.co.jp/%E9%A0%91%E5%81%A5%E5%9B%9E%E5%B8%B0%E6%8E%A8%E5%AE%9A- % E7% B5% B1% E8% A8% 88% E3% 83% A9% E3% 82% A4% E3% 83% 96% E3% 83% A9% E3% 83% AA% E3% 83% BC-% E8% 93% 91% E8% B0% B7-% E5% 8D% 83% E5% 87% B0% E5% BD% A6 / dp / 4254128371) ”. )

This book by Professor Fujisawa is insanely easy to understand and highly recommended. I remember feeling that God was there because it was easy to understand, even though there was only one book that I could read in Japanese.

Although it is a foreign book, the other thing I read about robust statistics is "Robust Statistics: Theory and Methods (Maronna, Martin, Yohai). / hatena-blog-22 /) ”and“ [Robust Statistics (Huber, Ronchetti)](https://www.amazon.co.jp/Robust-Statistics-Wiley-Probability/dp/0470129905/ref=pd_sbs_14_2/356 -3755802-2699304? _encoding = UTF8 & pd_rd_i = 0470129905 & pd_rd_r = 33c41826-0716-411d-9320-aa56f2cbbeba & pd_rd_w = K1XTl & pd_rd_wg = g7OpW & pf_rd_p = c295905f-82f9-4d73-8142-c393a4211258 & pf_rd_r = 7CFK72Z048WTV0C0HVYJ & psc = 1 & refRID = 7CFK72Z048WTV0C0HVYJ) "is I guess. After that, I felt like I caught a lot of treatises. I had some thoughts about robust statistics, but I will omit this story because it will be long.

Basics of mathematical statistics

[Basics of Mathematical Statistics.jpg ](Https://www.amazon.co.jp/%E6%95%B0%E7%90%86%E7%B5%B1%E8%A8%88%E5%AD%A6%E3%81%AE % E5% 9F% BA% E7% A4% 8E-% E9% 87% 8E% E7% 94% B0-% E4% B8% 80% E9% 9B% 84 / dp / 4320014529 / ref = sr_1_5? E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E6% 95% B0% E7% 90% 86% E7% B5% B1% E8% A8% 88 % E5% AD% A6% E3% 81% AE% E5% 9F% BA% E7% A4% 8E & qid = 1605955749 & s = books & sr = 1-5)

Purchased with the desire to acquire basic statistical knowledge through graduation research. The content is not light, but I am glad that I gained a full-fledged knowledge of statistics. Of all the books on mathematical statistics, this is my favorite book so far.

In terms of content, the story of probability is straightforward and details statistical models, convergence, testing, and estimation. I think the general story about mathematical statistics has been written. I think the expression "a full-fledged introductory book" fits nicely.

If the space between the lines is not filled, you can refer to "Theory of Point Optimization". I think. Although it is in English, I liked it because the explanation and writing style were easy to understand. In particular, I think that the asymptotic theory part was written in quite detail. I want to read it all someday, but I don't think I will. .. ..

(By the way, Theory of Point Optimization was recommended by the teacher, or this Book Review (pdf direct link) I think I read and took a peek at it. I think it's worth a read because it has been reviewed in detail.)

(For Theory of Point Optimization, I think the PDF was on the net.)

I was also reading a treatise

I wrote about the basics of statistical learning, but I read quite a few treatises. I didn't mean that I read a lot, but I thought it would be helpful to write that I read the book after this when I reached that level. .. ..

Returning to the story, the treatises that were mentioned in the references on the basis of statistical learning were fortunately easy to read, so it was a good practice. At first, I read the papers that are taken up as references in books, etc., and then trace the references written there again, and it is interesting to deepen my understanding of the ideas that led to each method and theory, and the problems that were solved. It is. It was also good that I was able to know the spread and connection by reading the treatises listed in the references in the treatise.

The treatises I read for my graduation research were not always easy to read, but I wasn't aware that I had to read them all, and I had some tolerance, so it was quite good. Needless to say, the teacher helped me a lot, but I think it was great that I was able to read treatises relatively stress-free after graduating from undergraduate school.

Second year summary

I finished my second year of study like this. Roughly speaking, I felt that I was studying machine learning in the first half and studying robust statistics and mathematical statistics through graduation research in the second half. By the way, I used MATLAB at the graduate school. I really liked it.

The growth in the past year has been enormous, and when I was studying for the first year, I was able to strengthen myself beyond my imagination.

It's not because of my efforts, but because of the teachers in the laboratory. It was great that he educated me properly so that I could understand it. I'll talk about the details on another occasion, but I felt that this kind of thing is called education. In addition to academic guidance, I was also educated about the basic ideas and attitudes that are necessary as a person, or as a member of society. This is a huge advantage in working. (Somehow, the feeling of saying "I was educated" is not very good, but guidance and training are not enough in terms of words, and the word "education" is perfect.)

I became a member of society (3rd year)

The third year is my first year as a member of society, and I started working as a data scientist. I'm not a machine learning engineer, but a data scientist on the business side, and I have a lot of consulting work. The first job I did was marketing consulting. When I was a student, I used to focus on applications in class (I used to talk about applications because it was management engineering), and in my own studies I focused on topics from theory, but even when I became a member of society, this balance I studied with the consciousness that I wanted to maintain.

At work, I had to work hard to acquire domain knowledge rather than statistics, and studying statistics became modest. This was a bit disappointing, but I think it would be beneficial if I could continue this work.

Pattern recognition and machine learning (commonly known as PRML)

[PRML.jpg](https:: //www.amazon.co.jp/%E3%83%91%E3%82%BF%E3%83%BC%E3%83%B3%E8%AA%8D%E8%AD%98%E3%81 % A8% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E4% B8% 8A-CM-% E3% 83% 93% E3% 82% B7% E3 % 83% A7% E3% 83% 83% E3% 83% 97 / dp / 4621061224) ![PRML below.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws. com / 0/296113 / 4c68e87f-4c0e-57b9-7193-fe43033127b7.jpeg)

Having read the basics of statistical learning, I read PRML, which is another must-read book. It felt like Bay Bayes, and it was difficult to follow the formula transformation. However, as expected, it is a famous book, and if you look it up online, you will find many useful sites, blogs, and slides. I had a hard time because the basics of statistical learning are small, but PRML was easy to study in this regard.

Especially this page is recommended. The PDF is distributed free of charge, but I bought the [paper version] of this (http://ankokudan.org/d/d.htm?detail093-detailread-j.html). I am very grateful for this content for 556 yen. Serious god.

Personally, I prefer the basics of statistical learning to PRML.

As with the basics of statistical learning, I think the important thing when studying like this is to move your hands honestly. Of course, it's my level, so I don't know the feeling of someone stronger than me, but ...

Text analytics

[Text Analytics.jpg](https //www.amazon.co.jp/%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E3%82%A2%E3%83%8A%E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9-% E7% B5% B1% E8% A8% 88% E5% AD% A6One-Point-% E9% 87% 91-% E6% 98% 8E% E5% 93% B2 / dp / 432011261X / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E3% 83% 86% E3% 82% AD% E3% 82% B9% E3% 83% 88% E3% 82% A2% E3% 83% 8A% E3% 83% AA% E3 % 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9 & qid = 1606048673 & s = english-books & sr = 1-1)

I bought it because it seemed that I often deal with text data at work. The content itself was good, but to be honest, it wasn't what I expected. But the content was good.

Rather than being polite about each method, I was able to get a bird's-eye view of the approach to analyzing text data.

Missing data processing

Missing data processing.jpg

In practice (but not so), it is an on-parade of missing data. Of course, you need to know how to handle missing data. So I picked up this book.

Basically, this book mainly focuses on the multiple assignment method, and I was satisfied with the content because it deals with the multiple assignment method in various methods, but the last chapter "Conclusion" was very good.

Besides, for example, in the chapter on multiple assignment method in logistic regression, the explanation of logistic regression itself (why logistic regression is used) was also very good.

Deep Learning from scratch

Deep Learning.jpg made from scratch

When I'm in this kind of work, I'm told something like "Deep Learning, you know", but I'm ashamed to say that I wasn't very interested in it and haven't studied until now. But I thought it was not good to keep saying that I didn't know at all, so I decided to study with this book for the time being.

I think it's a great introduction to seeing how it works inside.

Feature engineering for machine learning

[![Feature quantity engineering for machine learning.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/296113/17cbecdf-c49d-c9e9-f1d9-346f944c6797] .jpeg)](https://www.amazon.co.jp/%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%81%AE%E3% 81% 9F% E3% 82% 81% E3% 81% AE% E7% 89% B9% E5% BE% B4% E9% 87% 8F% E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3% 82% A2% E3% 83% AA% E3% 83% B3% E3% 82% B0-% E2% 80% 95% E3% 81% 9D% E3% 81% AE % E5% 8E% 9F% E7% 90% 86% E3% 81% A8Python% E3% 81% AB% E3% 82% 88% E3% 82% 8B% E5% AE% 9F% E8% B7% B5-% E3% 82% AA% E3% 83% A9% E3% 82% A4% E3% 83% AA% E3% 83% BC% E3% 83% BB% E3% 82% B8% E3% 83% A3% E3% 83% 91% E3% 83% B3-Alice-Zheng / dp / 4873118689 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = L0BJ87AUAYC7 & dchild = 1 & keywords =% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E7% 89% B9% E5% BE% B4% E9% 87% 8F% E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3% 82% A2% E3% 83% AA% E3% 83% B3% E3% 82% B0 & qid = 1606052771 & sprefix =% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 81% AE % 2Caps% 2C364 & sr = 8-1)

Until then, as far as I knew, I was making features from a few options. I picked up this book because I thought this wasn't very good.

In terms of content, it seems that the main method is how to create features.

To be honest, I didn't know much about it, but it was great to read what was organized in one book in this way because I could organize it in my head. Also, when I studied the analysis of text data, I got the impression that what I wanted to know was rather written in this book.

This book is still often referred to when analyzing.

Statistical causal search

Statistical causal search.jpg

I think many people would like to study statistical causal reasoning. I thought so and looked up various books, and found that this book looked good.

Starting with the basic story of statistical causal reasoning, the flow is to explain the method called LiNGAM devised by the author. The explanation was easy to understand, and I was able to read it with a pleasant feeling, such as "I wanted this kind of explanation." I highly recommend it.

Unfortunately, in practice, we are not often asked for a proper analysis, but I hope that one day we will recognize its importance. (But it would be difficult to explain this to non-analysts ...)

Introductory design of experiments

[Design of Experiments.jpg]( https://www.amazon.co.jp/%E5%85%A5%E9%96%80-%E5%AE%9F%E9%A8%93%E8%A8%88%E7%94%BB% E6% B3% 95-% E6% B0% B8% E7% 94% B0-% E9% 9D% 96 / dp / 4817103825 / ref = sr_1_10? adgrpid = 57126337881 & dchild = 1 & gclid = CjwKCAiAtej9BRAvEiwA0UAWXuhZeHtM8mwrwcLEZPJZ6HM__Qd0oo_KlQ9F77IAn2aoow5vJnWEZhoCwUEQAvD_BwE & hvadid = 338530619916 & hvdev = c & hvlocphy = 1009255 & hvnetw = g & hvqmt = b & hvrand = 18019147825820991363 & hvtargid = aud-990383411460% 3Akwd-302343910705 & hydadcr = 27706_11559583 & jp-ad-ap = 0 & keywords =% E5% AE% 9F% E9% A8% 93% E8% A8% 88% E7% 94% BB% E3% % E5% 88% 86% E6% 95% A3% E5% 88% 86% E6% 9E% 90% E3% 81% AE% E3% 81% AF% E3% 81% AA% E3% 81% 97 & qid = 1606055140 & sr = 8-10 & tag = googhydr-22)

This is a book by Professor Nagata, who won the Deming Prize last year.

I had wanted to study design of experiments for a long time because the correct data analysis started with the acquisition of the correct data, but I thought it would be a little troublesome and left it alone. .. .. After becoming a member of society, it became difficult to take time to study like when I was a student, so it seems that I can understand it even if I read it lightly (without moving my hands firmly and thinking long). The design of experiments was rather perfect. It's quite interesting to study. Well, when you think about it, it says something that is natural, but if you don't know it, you'll miss it.

It also includes stories about how to study and connections with each field, and I think it is a very kind structure for those who are self-taught. The expression development is quite polite, so some people may find it redundant, but I think many people find it appreciated.

The last chapter, about 50 pages, is a Q & A, which is also very readable.

Unfortunately, I think we should make a proper experimental plan, but it is difficult to make a proper experimental plan in practice or in business. .. .. Of course it depends on the field.

The essence of machine learning

kikaigakusyuunoessennsu.jpg

It was a hot topic, so I bought it, but it was a book that I had a lot of trouble with. Prerequisite knowledge is really a rare book up to high school mathematics. The explanation was also very easy to understand, and I got the impression that the beginners were given considerable consideration.

The content is quite unique, and about 270 pages are devoted to acquiring the knowledge (preparation of execution environment, Python basics, linear algebra, calculus, numerical calculation) necessary to start learning machine learning. The remaining 100 pages are composed of machine learning.

Until this book came out, I started studying machine learning, but I heard a lot of stories that I was frustrated while studying the prerequisite knowledge, so I thought that a really good book came out. If you read this book, you will be able to get started with machine learning easily. I want to study machine learning, what should I do? I recommend this book without exception to the question.

I also wanted to get started with this book.

3rd year summary

My honest impression is that I was studying marketing, management, domain knowledge, etc. when I started work, and even though I didn't have much time to study statistics, I read quite a bit. I think that it was because I got some basic skills in the second year and became able to read relatively smoothly at a moderate level, but there was no difficult book other than PRML, and it was big that it was explained carefully. think. I feel that there are many good books around here.

I think the basics of statistical learning and PRML are still popular books, but what about them? With so many good books out there, it may not always be necessary to read them early. I think it's okay if you want to know the connections between the methods after you have gained some strength, or if you want to study a set of methods with a consistent personality. I think it's a good book, but I think there is a good time to learn it.

It's the second year for working people (fourth year)

This year's story. This year, I was swayed by the new coronavirus, which I had never thought of at this time of last year. I was in a crisis at work, so I didn't have time to study further, such as devoting all my time resources to work. As expected, recently I realized that I had to take good care of myself and that if I was working in that way, it could have a coercive effect on others. Partly because of that, I was able to secure some study time by returning to normal operation. .. ..

Well, that's a good story, but this year I was studying a little more basicly than before.

Probability theory for statistics, beyond

[Probability theory for statistics.jpg](Https://www.amazon.co.jp/%E7%B5%B1%E8%A8%88%E5%AD%A6%E3%81%B8%E3%81%AE%E7%A2% BA% E7% 8E% 87% E8% AB% 96% E3% 80% 81% E3% 81% 9D% E3% 81% AE% E5% 85% 88% E3% 81% B8% E2% 80% 95% E3% 82% BC% E3% 83% AD% E3% 81% 8B% E3% 82% 89% E3% 81% AE% E6% B8% AC% E5% BA% A6% E8% AB% 96% E7% 9A% 84% E7% 90% 86% E8% A7% A3% E3% 81% A8% E6% BC% B8% E8% BF% 91% E7% 90% 86% E8% AB% 96% E3% 81% B8%E3%81%AE%E6%9E%B6%E3%81%91%E6%A9%8B-%E6%B8%85%E6%B0%B4-%E6%B3%B0%E9%9A% 86 / dp / 4753601250 / ref = sr_1_1? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E7% B5% B1% E8% A8% 88% E5% AD% A6% E3% 81% B8% E3% 81% AE% E7% A2% BA% E7% 8E% 87% E8% AB% 96 & qid = 1606093185 & sr = 8-1)

You will often see the description in various books that "knowledge of measure-theoretic probability theory is not required". Looking at this description, I felt like "Is it necessary to have knowledge of measure-theoretic probability theory?", And I think that there are many people who have repeated the experience of explosive death, although I will try to learn it. Of course, I have been killed repeatedly.

This book is perfect for such people.

By studying probability theory for understanding mathematical statistics, you can also learn important matters of measure theory. I think that it is suitable for those who are not aiming to acquire the measure theory of probability theory itself, but to cultivate the basic ability to withstand the so-called application for studying mathematical statistics.

** "For many statistical problems, it is almost okay to set $ \ Omega = R ^ d $" ** and omit the proof of the hop extension theorem ** "This theorem is a probability It is essential in constructing a measure, and you still have to study measure theory to prove it. However, it will not be so difficult to understand the theorem's claim. If you accept this theorem, Many steps in the early stages of measure theory can be omitted. If you want to understand applied probability theory to statistics etc., I think it is better to accept this theorem and move on. "** I think that such a description well expresses the character of this book.

The method of explanation is educational, and it is explained in a lecture tone rather than the plain explanation that is often found in specialized books. It was a very thankful explanation for those who self-taught, because the number of characters was firmly devoted to such stories, such as what would happen if the assumption was dropped and what this story would lead to.

Functional analysis for engineering

[Functional analysis for engineering.jpg](Https://www.amazon.co.jp/%E5%B7%A5%E5%AD%A6%E3%81%AE%E3%81%9F%E3%82%81%E3%81% AE% E9% 96% A2% E6% 95% B0% E8% A7% A3% E6% 9E% 90-% E5% B7% A5% E5% AD% A6% E3% 81% AE% E3% 81% 9F % E3% 82% 81% E3% 81% AE% E6% 95% B0% E5% AD% A6-% E5% B1% B1% E7% 94% B0-% E5% 8A% 9F / dp / 4901683624 / ref = sr_1_1? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = 1CEDL53GI3D3Z & dchild = 1 & keywords =% E5% B7% A5% E5% AD% A6% E3% 81 % AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E9% 96% A2% E6% 95% B0% E8% A7% A3% E6% 9E% 90 & qid = 1606095842 & sprefix =% E5% B7% A5% E5% AD% A6% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% 2Caps% 2C291 & sr = 8-1)

It's not a statistics book, but I've learned a lot, so I'll introduce it. In particular, I think that understanding will deepen if you proceed with the study of this book before or while reading "[Probability theory for statistics, beyond](#Probability theory for statistics, beyond)".

It says "for engineering", but is there a school that does this in the Faculty of Engineering? It is a content that is scary. (It just says "for engineering", not "doing it at the Faculty of Engineering".)

It's important to note that the word "for engineering" is also misleading, and this does not mean that it is not rigorous and intuitive, but that it deals with functional analysis used in engineering. It means that. Therefore, I think that reading this book will deepen your understanding of mathematical statistics, optimization, and probability theory.

However, thankfully for engineering people, the fact that there is almost no proof and there is almost no line spacing may be said to be "just for engineering"! Serious God!

Also, I think it is a feature that comments are added to each document on the reference page. I haven't followed it, but the comments are hot, so if you can afford it, please do!

Introduction to effect verification

[Introduction to effect verification.jpg]( https://www.amazon.co.jp/%E5%8A%B9%E6%9E%9C%E6%A4%9C%E8%A8%BC%E5%85%A5%E9%96%80%E3 % 80% 9C% E6% AD% A3% E3% 81% 97% E3% 81% 84% E6% AF% 94% E8% BC% 83% E3% 81% AE% E3% 81% 9F% E3% 82 % 81% E3% 81% AE% E5% 9B% A0% E6% 9E% 9C% E6% 8E% A8% E8% AB% 96-% E8% A8% 88% E9% 87% 8F% E7% B5% 8C% E6% B8% 88% E5% AD% A6% E3% 81% AE% E5% 9F% BA% E7% A4% 8E-% E5% AE% 89% E4% BA% 95-% E7% BF% 94% E5% A4% AA / dp / 4297111179 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E5% 8A% B9% E6% 9E% 9C% E6% A4% 9C% E8% A8% BC% E5% 85% A5% E9% 96% 80 & qid = 1606097182 & sr = 8-1)

I bought it because it was a hot topic, but it was a very good book because it was a hot topic. It was written with a lot of business in mind rather than a student, so I felt that the content was very practical.

Book reviews about the content are written in various places, so I will omit them here.

Personally, I loved the preface and the postscript, and shook my head up and down like a red beetle. You can see that it is a good book just by reading here.

Practical machine learning with Scikit-learn and TensorFlow

[Tokage book.jpg](https //www.amazon.co.jp/%E5%8A%B9%E6%9E%9C%E6%A4%9C%E8%A8%BC%E5%85%A5%E9%96%80%E3% 80% 9C% E6% AD% A3% E3% 81% 97% E3% 81% 84% E6% AF% 94% E8% BC% 83% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E5% 9B% A0% E6% 9E% 9C% E6% 8E% A8% E8% AB% 96-% E8% A8% 88% E9% 87% 8F% E7% B5% 8C % E6% B8% 88% E5% AD% A6% E3% 81% AE% E5% 9F% BA% E7% A4% 8E-% E5% AE% 89% E4% BA% 95-% E7% BF% 94 % E5% A4% AA / dp / 4297111179 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & dchild = 1 & keywords =% E5% 8A% B9 % E6% 9E% 9C% E6% A4% 9C% E8% A8% BC% E5% 85% A5% E9% 96% 80 & qid = 1606097182 & sr = 8-1)

At work, I had only Excel and had few opportunities to use R or Python. I used R when I was a student, so I use R for some statistical analysis, but I've never used Python at all. I touched Python in [Essence of Machine Learning](# Essence of Machine Learning) and [Deep Learning from Zero](# Deep Learning from Zero), but both of them write code from scratch, so it seems to be used in practice. I bought it because I wanted to keep the tools available.

First of all, in the first half, I found that Scikit-Learn is quite simple and convenient, but it seems like it's just that.

At the beginning, a natural (and ideal) story is written, and it seems that a practical story will be developed throughout this book. However, that is only the beginning. After that, the explanation of the algorithm and other things are developed, and it makes me feel like "Where did you practice?" It's nice to focus on the algorithm, but I was wondering if I wanted to read the problems and error handling that often occur when using that algorithm.

Also, I wonder if I didn't need an explanation of the theory. If so, I wanted you to write it a little more firmly. However, the purpose should not be a theory, so I wanted you to devote space to dealing with the problems described above while writing only ideas and outlines as ggrks.

Regarding the code, I personally think that the one published on GitHub is quite good. I think this is valuable.

Also, in this book, the version of TensorFlow is 1, but now it is called TensorFlow2. .. ..

But it was a big harvest to learn the basic usage of Scikit-Learn. Although I have some dissatisfaction, I am personally satisfied because I was able to achieve my goal.

4th year summary

I didn't have much time to study because I put most of my time resources into my work this year, but I think I was able to improve my ability. The results at work were great, so I think I'll get a passing score in terms of overall score.

Throughout the year, I thought that it was great that I was able to study relatively basics. And don't forget that there are books that are relatively basic and that you can do your best on your own, and that the environment is becoming very grateful.

Also, I don't use it at all at work, or I don't understand it while writing code, but I can't say that anymore, so Python is often used. I think it's progress that I started studying. (At the graduate school, the class was R in MATLAB, so I touched Python for the first time, but it should be easy to understand.)

In the future, I would like to continue learning Python, etc., while introducing time series analysis, reinforcement learning, and Bayesian statistics. The rest is thinking about my career.

Summary

At the stage of reading each book, I think I was able to sort out how much knowledge I had and what kind of impressions I had. I hope it helps someone.

Bonus: Currently progressive book

** ① Theory and method of Bayesian statistics ** [Theory and method of Bayesian statistics.jpg](Https://www.amazon.co.jp/%E3%83%99%E3%82%A4%E3%82%BA%E7%B5%B1%E8%A8%88%E3%81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E6% 96% B9% E6% B3% 95-% E6% B8% A1% E8% BE% BA-% E6% BE% 84% E5% A4% AB / dp / 4339024627 / ref = sr_1_1? adgrpid = 88754979296 & dchild = 1 & gclid = CjwKCAiAtej9BRAvEiwA0UAWXvyWzwLZGNMu1qfEKd3yn16KyW57h-5alyEHjckVORcdNo6Ip_wfdBoCPpYQAvD_BwE & hvadid = 410416604105 & hvdev = c & hvlocphy = 1009255 & hvnetw = g & hvqmt = e & hvrand = 7720280795536605024 & hvtargid = aud-990383411460% 3Akwd-889135189364 & hydadcr = 27491_11564684 & jp-ad- ap = 0 & keywords =% E3% 83% 99% E3% 82% A4% E3% 82% BA% E7% B5% B1% E8% A8% 88 +% E3% 81% AE +% E7% 90% 86% E8% AB% 96 +% E3% 81% A8 +% E6% 96% B9% E6% B3% 95 & qid = 1606113928 & sr = 8-1 & tag = googhydr-22)

Purchased to get started with Bayesian statistics. It's difficult to say as rumored, it's difficult to connect with the knowledge you have now, and the line spacing is quite wide, so the current situation is that we haven't made any progress. I think it's okay to take some time on this, so I'll take it slowly and slowly. By the way, I got the advice that I should look at it from Chapter 7, and it was certainly quite good to do so. I think it's a good idea to read Chapters 1 and 7 before proceeding to Chapter 2.


** ② Python machine learning programming ** [![Python Machine Learning Programming.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/296113/92c772d8-a9f9-302f-11ed-14a326909f64.jpeg)](https://www.amazon.co.jp/%E7%AC%AC3%E7%89%88-Python%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF% 92% E3% 83% 97% E3% 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0-% E9% 81% 94 % E4% BA% BA% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% B5% E3% 82% A4% E3% 82% A8% E3% 83% B3% E3 % 83% 86% E3% 82% A3% E3% 82% B9% E3% 83% 88% E3% 81% AB% E3% 82% 88% E3% 82% 8B% E7% 90% 86% E8% AB % 96% E3% 81% A8% E5% AE% 9F% E8% B7% B5-impress-gear / dp / 4295010073 / ref = sr_1_3_sspa? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = 1LFZPTNOVY9RI & dchild = 1 & keywords = python +% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92 & qid = 1606116134 & sprefix = Python% 2Caps% 2C278 & sr = 8- 3-spons & psc = 1 & spLa = ZW5jcnlwdGVkUXVhbGlmaWVyPUEzVjg3REVPUlY4MzNUJmVuY3J5cHRlZElkPUEwOTQ4NDk4M0ZSN0hDTjdCRUlIWSZlbmNyeXB0ZWRBZElkPUEzQVdCWUY5Q0xCUVpBJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ ==)

I wanted a book compatible with TensorFlow2, and I was actually trying to buy the second edition of [Lizard Book](practical machine learning with # Scikit-learn and TensorFlow), but I bought the same book. I was looking for another book because it hindered my lizard. Meanwhile, I saw that the third edition of Python machine learning programming came out, so I bought it. I wasn't so worried because both the 1st and 2nd editions had a good reputation, but I think it fits my needs to know how to use it better than theory. I will.

I haven't made much progress yet, but what I like about what I've read so far is like this. -The code is simple and easy to understand. -Comments are attached to each code. ・ The annotations have reached the point where it is horribly itchy. ・ If you have touched numpy or scikit-learn for a while, you can understand it without difficulty.

So I will continue to do my best. I hope I can learn the contents of these two books by the end of this year.

Recommended Posts

Introducing books related to data science.
Books on data science to read in 2020
Recommended books by 3 types related to Python
Learn data science
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Challenge 100 data science knocks
Introducing Python 2.7 to CentOS 6.6
OPT data science competition
I have read 10 books related to time series data, so I will write a book review.
How to implement 100 data science knocks for data science beginners (for windows10 Home)
Data science companion in python, how to specify elements in pandas
Data science 100 knock commentary (P021 ~ 040)
Data science 100 knock commentary (P041 ~ 060)
Data science 100 knock commentary (P081 ~ 100)
Data Science Cheat Sheet (Python)
Introducing PyMySQL to raspberry pi3
Try "100 knocks on data science" ①
Infra_ Data Science Course Output
How to handle data frames
Install Networkx in Python 3.7 environment for use in malware data science books
I have read 13 books that connect data science to business results, so I will write a book review.