[PYTHON] Knowledge and study methods required for future data analysts

Target audience and content of the article

The target audience of this article is those who are going to analyze data on user behavior and sales of companies, and researchers who have to do a little troublesome data analysis to do with Excel.

Background to the need for "deep" data analysts

In recent years, it has become extremely important for IT companies to analyze data and connect it to detailed improvements in UI and UX and implementation of growth hack measures. Even in university research, it is becoming impossible to differentiate from other researchers without making full use of advanced data analysis.

Therefore, in this article, I would like to write about the data analysis ability and study method required in the future for data analysis / programming beginners who want to perform data analysis in the future.

What are the abilities required of a data analyst?

  1. Logical thinking ability to see data correctly
  2. Advanced analytical methods such as statistics and machine learning
  3. Programming ability to format and aggregate data
  4. Design ability to plan product development and experiments in anticipation of data analysis

To be a top-notch analyst of value, you need all of the above abilities, The most important of the above is the purpose of this article, which is 4 "Design ability to plan product development and experiments in anticipation of data analysis".

1. Logical thinking ability to see data correctly and how to study it

What is the logical thinking ability to see data correctly?

This is quite difficult. And I get the impression that there are many people who think they can do it themselves but not unexpectedly.

For example, when analyzing whether or not to take the action A in the app contributes to the retention rate.

  1. Percentage of "people who took the action A and continued" and "people who did not"
  2. Percentage of "people who took the action A" and "people who did not take it"
  3. Percentage of "people who did not take the action of A" and "people who did not"

These meanings are of course different, but which indicators are important in your analysis? What kind of analysis should be performed when these values are, and what kind of measures should be taken next?

analysis.jpg

In the example shown below, The "percentage of those who took the action of A and continued" in 1 was ** sufficient **, Despite the fact that the ratio of "people who took the action of A" and "people who did not take it" in 2 was almost the same Because the "percentage of people who did not take the action of A" in 3 is large, It may be better not to take the action of A.

It's still simple if there is only one action, A, but it suddenly becomes complicated when it comes to analyzing whether or not the action B was done at the same time. Can you understand the situation properly and think about improvement measures without getting confused?

How to acquire logical thinking ability to see data correctly

This can only be learned by looking at various patterns, thinking, and discussing with someone. When you look at news and various numbers on a daily basis, it is important to think for yourself, "Is it true?"

a little while ago, "Numbers don't lie, but liars use numbers" The word was also talked about ...

The ability to read the truth from data and logic can be called an indispensable ability for modern people.

-["Factfulness"-Hans Rosling (Author)](https://www.amazon.co.jp/FACTFULNESS-%E3%83%95%E3%82%A1%E3%82%AF% E3% 83% 88% E3% 83% 95% E3% 83% AB% E3% 83% 8D% E3% 82% B9-10% E3% 81% AE% E6% 80% 9D% E3% 81% 84% E8% BE% BC% E3% 81% BF% E3% 82% 92% E4% B9% 97% E3% 82% 8A% E8% B6% 8A% E3% 81% 88% E3% 80% 81% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% 92% E5% 9F% BA% E3% 81% AB% E4% B8% 96% E7% 95% 8C% E3% 82% 92% E6% AD% A3% E3% 81% 97% E3% 81% 8F% E8% A6% 8B% E3% 82% 8B% E7% BF% 92% E6% 85% A3-% E3% 83% 8F % E3% 83% B3% E3% 82% B9% E3% 83% BB% E3% 83% AD% E3% 82% B9% E3% 83% AA% E3% 83% B3% E3% 82% B0 / dp / 4822289605 / ref = sr_1_1? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords =% E3% 83% 95% E3% 82% A1% E3% 82 % AF% E3% 83% 88% E3% 83% 95% E3% 83% AB% E3% 83% 8D% E3% 82% B9 & qid = 1575342993 & s = books & sr = 1-1) -[Power of data analysis Thinking method approaching causal relationship --Koichiro Ito (Author)](https://www.amazon.co.jp/%E3%83%87%E3%83%BC%E3%82%BF % E5% 88% 86% E6% 9E% 90% E3% 81% AE% E5% 8A% 9B-% E5% 9B% A0% E6% 9E% 9C% E9% 96% A2% E4% BF% 82% E3% 81% AB% E8% BF% AB% E3% 82% 8B% E6% 80% 9D% E8% 80% 83% E6% B3% 95-% E5% 85% 89% E6% 96% 87% E7 % A4% BE% E6% 96% B0% E6% 9B% B8-% E4% BC% 8A% E8% 97% A4-% E5% 85% AC% E4% B8% 80% E6% 9C% 97 / dp / 4334039863 / ref = pd_sbs_14_15? _encoding = UTF8 & pd_rd_i = 4334039863 & pd_rd_r = e970ae50-e889-451d-9e40-6d3264a293f1 & pd_rd_w = d5ZGU & pd_rd_wg = wN7Ko & pf_rd_p = 1585d594-d9d0-474b-8a4e-69eca1566911 & pf_rd_r = YBB8DATVRJFQSWGYW3K3 & psc = 1 & refRID = YBB8DATVRJFQSWGYW3K3) -["Logical Training 101" -Shigeki Noya (Author)](https://www.amazon.co.jp/%E8%AB%96%E7%90%86%E3%83%88%E3%83 % AC% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0101% E9% A1% 8C-% E9% 87% 8E% E7% 9F% A2-% E8% 8C % 82% E6% A8% B9 / dp / 478280136X / ref = asc_df_478280136X /? Tag = jpo-22 & linkCode = df0 & hvadid = 296142501839 & hvpos = 1o3 & hvnetw = g & hvrand = 12583102664766118578 & hvpone = & hvptwo = & hvq 525692522150 & psc = 1 & th = 1 & psc = 1)

It is also important to read a book like the third one and improve your Japanese. Because humans manipulate numbers with words and deceive people. Let's thoroughly learn how to handle numbers and logical words and survive the big data era in the future.

2. Advanced analytical methods such as statistics and machine learning and their study methods

What is the mathematical ability to master statistics and machine learning?

How much of this mathematical ability and knowledge you need will depend on where you are.

I think it is not necessary if you just do a simple AB test, cohort analysis or funnel analysis. By making full use of these analysis methods, it is quite possible to consider the following growth hack measures.

-Find the magic number and increase the retention rate of the app -What is Growth Hack? Articles that can be understood from case studies to successful methods in 5 minutes

But is it possible to make a sufficient judgment with the existing number of data? Is there a proper way to collect data so that you can make a sufficiently correct decision? Just thinking about a little deeper question like that requires some statistical knowledge and understanding of numbers.

Mr. Tsuyoshi Morioka, a marketer who is known for the V-shaped recovery of USJ and Marugame Seimen, Book ["Strategic theory of stochastic thinking, the power of mathematical marketing demonstrated by USJ"](https://www.amazon.co.jp/%E7%A2%BA%E7%8E%87%E6%80%9D % E8% 80% 83% E3% 81% AE% E6% 88% A6% E7% 95% A5% E8% AB% 96-USJ% E3% 81% A7% E3% 82% 82% E5% AE% 9F % E8% A8% BC% E3% 81% 95% E3% 82% 8C% E3% 81% 9F% E6% 95% B0% E5% AD% A6% E3% 83% 9E% E3% 83% BC% E3 % 82% B1% E3% 83% 86% E3% 82% A3% E3% 83% B3% E3% 82% B0% E3% 81% AE% E5% 8A% 9B-% E6% A3% AE% E5% B2% A1-% E6% AF% 85 / dp / 404101422/ref=sr_1_2_sspa? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords =% E3% 83 % 9E% E3% 83% BC% E3% 82% B1% E3% 83% 86% E3% 82% A3% E3% 83% B3% E3% 82% B0 +% E7% B5% B1% E8% A8% 88 +% E6% A3% AE% E5% B2% A1 & qid = 1575357645 & s = books & sr = 1-2-spons & psc = 1 & spLa = ZW5jcnlwdGVkUXVhbGlmaWVyPUFTSEpVMTNFMlVLNzgmZW5jcnlwdGVkSWQ9QTA4MDI2NjMyMlZZRVVQSTQ2V1VXJmVuY3J5cHRlZEFkSWQ9QTJQRFpHV0FPMkw3TVQmd2lkZ2V0TmFtZT1zcF9tdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl) in, He is trying to gain deep insight into consumer behavior with "mathematical thinking."

Furthermore, as we do business in modern times, a deep understanding of AI and deep learning is essential. Unfortunately, many people have a vague view of AI, and most of what they say is off the mark.

Can you imagine what your competitors are doing when they say, "Use AI to get customer data?" Can you say "I can do that" and "I can use the data for more valuable things"?

In fact, you don't need that much mathematical ability to understand AI. Anyone who somehow remembers high school math can understand it. DeepLearning_Ueno.jpg

I would like to write "Introduction to AI" in another article.

How to study advanced analytical methods such as statistics and machine learning

There is no choice but to move your hands to study this. To move your hands and study statistics and machine learning, I think it is more efficient if you can do simple programming. I wear all three, but this is one reason why I think analysts should also learn programming. After all, it's easier to understand if you play with the data and experience it.

Again, it may not be necessary if you just want to be able to do what you can do with Excel with BI tools. (First of all, let's be able to do such a natural thing.)

But if anyone trains, they will be able to do that. If you want to study a little advanced analysis method that goes one step ahead of those analysts, I think it is better to study while moving your hands using programming.

-[Basics of statistical analysis understood with Python](https://www.amazon.co.jp/Python%E3%81%A7%E7%90%86%E8%A7%A3%E3%81%99% E3% 82% 8B% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 81% AE% E5% 9F% BA% E7% A4% 8E-PYTHON% C3% 97MATH-% E8% B0% B7% E5% 90% 88-% E5% BB% A3% E7% B4% 80 / dp / 4297100495 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82 % BF% E3% 82% AB% E3% 83% 8A & keywords = python +% E7% B5% B1% E8% A8% 88 & qid = 1575350007 & s = books & sr = 1-1) -[Machine learning starting with Python-features learned with scicit-learn Basics of engineering and machine learning](https://www.amazon.co.jp/Python%E3%81%A7%E3%81%AF%E3% 81% 98% E3% 82% 81% E3% 82% 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E2% 80% 95scikit-learn% E3 % 81% A7% E5% AD% A6% E3% 81% B6% E7% 89% B9% E5% BE% B4% E9% 87% 8F% E3% 82% A8% E3% 83% B3% E3% 82 % B8% E3% 83% 8B% E3% 82% A2% E3% 83% AA% E3% 83% B3% E3% 82% B0% E3% 81% A8% E6% A9% 9F% E6% A2% B0 % E5% AD% A6% E7% BF% 92% E3% 81% AE% E5% 9F% BA% E7% A4% 8E-Andreas-C-Muller / dp / 4873117984 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords = python +% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92 & qid = 1575349939 & s = books & sr = 1-1) -[Deep Learning from scratch-Theory and implementation of deep learning learned with Python](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%8B%] E3% 82% 89% E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87 % E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3 % 81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4-% E5 % BA% B7% E6% AF% 85 / dp / 4873117585 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = 3NA8L21Q3EA9K & keywords = python + deep + learning & qid = 1575352348 & s = books & sprefix = python + deepl% 2Cstripbooks% 2C262 & sr = 1-1)

3. Programming ability to format and aggregate data

Do you need programming skills to format and aggregate data?

I think the most controversial point is whether an analyst should acquire programming skills, and there are people who actually say a lot. (To be precise, SQL is not a programming language, but Ignore the finer distinctions lol)

--Necessary group -SQL (Introduction) and Tableau for data analysis that anyone can understand --Unnecessary group - [Column] Is programming necessary for data analysis?

Both are correct and understandable. (The assumptions are also different for each article.)

As I mentioned at the beginning, I am a necessary person. I will explain in detail in 4, but to be more precise, I think it is necessary to become an "analyst who can go one step ahead of others and create truly valuable products and experiments."

To put it simply, there are several reasons:

BI tools are by no means all-purpose

There are many useful BI tools in the world. These products are great and of great value as they help democratize the data. If all Japanese companies master BI tools and use PDCA, the Japanese economy will grow at an explosive rate.

However, there is currently a limit to what you can do without programming or SQL. Many BI tools require you to write SQL queries for data aggregation and integration yourself. In fact, BI tools that claim to be able to do advanced things are actually saying, "You can write and execute Python code internally."

The story of Mercari's data science team and analytics ecosystem According to this article

At Mercari, there are many people who can write SQL even in non-analyst positions and do simple analysis by themselves. Recently, there are rumors that even the finance department, accounting department, legal department, and designers are studying. (It's a fact)

And that. Everyone is studying hard because they need to execute SQL to analyze it.

And more importantly

Data does not come down even if I open my mouth and wait

about it.

Of course, to analyze data, you have to collect it. There is no need for an analyst working in a hyper-tsuyotsuyo company that is super data-driven and has a perfect analytical base. Start-ups that don't have an analytical foundation yet, or small business analysts who don't even have the concept of analytics, have no choice but to collect the data themselves.

Let's analyze the data collected by yourself, plan smart measure plans, check methods after implementing measures, and even next actions. And get the result. This is the easiest way to show the importance of "speaking with data".

In addition, the role of the analyst is very important in creating the data analysis infrastructure. Infrastructure engineers and database engineers who are good at data analysis are not always in-house.

Do you really collect the data you want with a data analysis platform that is left to others and created on your own? What do you do when you outsource your data analysis infrastructure and can't do what you really want?

And, as I will explain in detail in the next 4, I think that "future" analysts are required to be able to discuss upstream design such as product development and experimental design with management and engineers.

Simple analysis is becoming more and more automated, useful tools are coming out, and humans are no longer needed.

How to study programming that can format and aggregate data

Let's actually move our hands here as well.

-[Introduction to Data Analysis with Python 2nd Edition-Data Processing Using NumPy and pandas](https://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82] % 8B% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90% E5% 85% A5% E9% 96% 80-% E7% AC% AC2% E7% 89% 88-% E2% 80% 95NumPy% E3% 80% 81pandas% E3% 82% 92% E4% BD% BF% E3% 81% A3% E3% 81% 9F% E3% 83% 87 % E3% 83% BC% E3% 82% BF% E5% 87% A6% E7% 90% 86-Wes-McKinney / dp / 487311845X / ref = sr_1_6? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords = python +% E7% B5% B1% E8% A8% 88 & qid = 1575350007 & s = books & sr = 1-6) -[Introduction to SQL data analysis and utilization Technology for opening the door of data science MySQL / PostgreSQL compatible](https://www.amazon.co.jp/SQL%E3%83%87%E3%83%BC%] E3% 82% BF% E5% 88% 86% E6% 9E% 90% E3% 83% BB% E6% B4% BB% E7% 94% A8% E5% 85% A5% E9% 96% 80-% E3 % 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% B5% E3% 82% A4% E3% 82% A8% E3% 83% B3% E3% 82% B9% E3% 81 % AE% E6% 89% 89% E3% 82% 92% E9% 96% 8B% E3% 81% 8F% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E6% 8A% 80 % E8% A1% 93-MySQL-PostgreSQL-% E4% B8% A1% E5% AF% BE% E5% BF% 9C / dp / 4802612265 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82 % BF% E3% 82% AB% E3% 83% 8A & keywords = sql +% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90 & qid = 1575352511 & s = books & sr = 1-1) -[The easiest growth hack textbook, a fast-growing marketing strategy taught by a popular instructor](https://www.amazon.co.jp/%E3%81%84%E3%81%A1%E3%81%B0%E3 % 82% 93% E3% 82% 84% E3% 81% 95% E3% 81% 97% E3% 81% 84% E3% 82% B0% E3% 83% AD% E3% 83% BC% E3% 82 % B9% E3% 83% 8F% E3% 83% 83% E3% 82% AF% E3% 81% AE% E6% 95% 99% E6% 9C% AC-% E4% BA% BA% E6% B0% 97% E8% AC% 9B% E5% B8% AB% E3% 81% 8C% E6% 95% 99% E3% 81% 88% E3% 82% 8B% E6% 80% A5% E6% 88% 90% E9% 95% B7% E3% 83% 9E% E3% 83% BC% E3% 82% B1% E3% 83% 86% E3% 82% A3% E3% 83% B3% E3% 82% B0% E6% 88% A6% E7% 95% A5-% E3% 80% 8C% E3% 81% 84% E3% 81% A1% E3% 81% B0% E3% 82% 93% E3% 82% 84% E3% 81 % 95% E3% 81% 97% E3% 81% 84% E6% 95% 99% E6% 9C% AC% E3% 80% 8D% E3% 82% B7% E3% 83% AA% E3% 83% BC % E3% 82% BA-% E9% 87% 91% E5% B1% B1-% E8% A3% 95% E6% A8% B9 / dp / 4844339923 / ref = sr_1_4? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords =% E3% 82% B0% E3% 83% AD% E3% 83% BC% E3% 82% B9% E3% 83% 8F% E3 % 83% 83% E3% 82% AF +% E6% 9C% AC & qid = 1575354607 & s = books & sr = 1-4)

The quickest way to study is to replace what you are doing in Excel with a programming language such as Python while reading a book like the one above. Then, look at a book, prepare textual data, and play with free BI tools and SQL. I recommend it. (Although it is a programming study method, it requires difficult things for those who have no knowledge of IT ......... laughter)

And let's actually do the analysis that appears in books like the third one in Python or SQL!

4. Design ability to plan product development and experiments in anticipation of data analysis

** What is really important is to develop products and plan experiments in anticipation of data analysis. ** **

This is the "conclusion" of this article.

Importance of product development and experimental design in anticipation of data analysis

In the future, BI tools will become more and more convenient, and there will be almost nothing you can't do. In fact, anyone can do advanced analysis such as machine learning.

Furthermore, basic data analysis will be automated and tools will find important indicators.

However, what is really "important" and difficult is what kind of hypothesis should be analyzed in the future, what kind of suggestions should be obtained from it, and what kind of measures should be taken. ** Product design and experiment It is important to think about ** at the planning stage of.

The reason why we decided to introduce the new dashboard "Looker" before the product was released In this article, it is stated that Mercari had a data analysis infrastructure in place before the product was released.

In fact, ** at the product design and experiment planning stages **, if you don't think about data analysis, you may not get the data you want or the analysis results may be unreliable. .. For example, what kind of log do you need to analyze the data you want? Even if you do an AB test, you need to think about how long and how many people should be tested.

It will take a little longer to be automated to this point.

That is why, in the modern age of data democratization, we believe that we really need an "analyst" and "designer" who knows not only the basic physical strength of data analysis and business, but also data collection and product development. ..

How to study product development and experimental design in anticipation of data analysis

After all, this is also practiced lol There are many startups that are gradually improving their data analysis infrastructure, but lacking upper layer data analysts.

To put it the other way around, most startups that have achieved results and are planning to expand rapidly are analysts who "can analyze from data shaping and collection by themselves, take measures based on the analysis results, and formulate a PDCA plan." I am seeking.

At such a company, I get the results by myself. Involve various people and run the PDCA cycle. If you can become such an analyst, you will be able to add value wherever you are.

If you've been doing simple marketing or growth hack-like data analysis, it's a good idea to improve your engineering skills in the way described in this article. It would be nice if the person who is currently working as a PM goes through the data analysis.

Become a data analysis man, help your data analysis debut around you, and make a parlinai with data! !!

Recommended Posts

Knowledge and study methods required for future data analysts
Data processing methods for mechanical engineers and non-computer engineers (Introduction 2)
Data processing methods for mechanical engineers and non-computer engineers (Introduction 1)
VS Code snippets for data analysts
Initial settings for Mac (for data analysts)
Summary of mathematical scope and learning resources required for machine learning and data science
Detailed Python techniques required for data shaping (1)
Detailed Python techniques required for data shaping (2)
[Competitive programming] [Python3] Required knowledge, for myself
Smoothing of time series and waveform data 3 methods (smoothing)