[PYTHON] [Unexpectedly known? ] Introducing a real day in the data analysis department

Introduction

This article is written for those who want to work as a data analyst or data scientist, especially in the data department of a business company. Also, many people who do not know ** what kind of day the people in the data department in the company are spending **. I think that the so-called data department is often inconspicuous within the company due to the nature of work, so I would appreciate it if you could take this opportunity to learn about it. スクリーンショット 2020-06-19 13.54.12.png

What you can get from this article

** ① You can know the real day of a person who makes a living in data analysis ** ** ② You will be able to sympathize with the worries of people in the data department of the company.

Why write this article

Before I was 30 years old, I changed from a non-IT industry to a data analyst, but until I changed jobs, I didn't know much about the specific work and daily routines of data handling work.

Of course, if you google, you will see articles like "I work like this. The schedule for the day looks like this" as a model case for data analysts. I wanted to know "the specific work content and how the person is working", but the model case lacked reality, so it was not very helpful.

Then, I came up with the idea that if an active data analyst spells out a real life, it might be of some help to someone.

Let's read on now!

What is a data analyst in the first place?

A data analyst is a profession that specializes in phases such as data processing and status analysis. We analyze the collected big data to find out user behavior, regularity, future needs, and so on. After that, the main job of the data analyst is to formulate hypotheses, propose problem-solving methods, and use them to improve the services provided. It is also the role of the data department to make the entire company aware of the value of data utilization and create a data-driven culture.

In practice, there are days when we just aggregate, some days when we just create dashboards, and sometimes we do machine learning modeling.

So what kind of day are you actually spending? I would like to introduce a day of a data analyst.

A day of a data analyst

08:30 Commuting

When you can afford it, it's time to read. On days when I can't afford it, I go to work while listening to music. SNS check is the minimum necessary. If you put a lot of information in your head here, it will make your work inefficient in the morning, so moderately.

09:00 slack notification confirmation

Check for daily table updates in BigQuery. We have a mechanism to notify slack of success or failure of table update every day, and if it fails, mention will fly. "No abnormality today!" In the unlikely event that the table update fails, we will immediately investigate the cause. In that case, the scheduling of the day changes, so I'm thrilled. Therefore, it is my daily routine to check before going to work.

10:00 Check the SLO dashboard

This is the job I do in the morning. Check the slo dashboard. スクリーンショット 2020-06-18 13.34.42.png

SLO refers to the basis / target value that a service provider can actually provide a service to a user.

Today, the success rate of a service was extremely low, so I decided to investigate it.

"Somehow the success rate is very low today, why ?!"

11:00 Morning assembly

Since my team has 3 people, the usual 3 people check the tasks of the team in a day. First of all, share that there was an abnormal value on the dashboard. The team leader will report to the current manager. By the way, today's business content is ①Dashboard outlier investigation (2) Data set creation for analysis (today's main task) ③Brainstorming for policy planning "I have to finish the high-priority ① as soon as possible. The data set for analysis is the one who writes the query, so let's do it in the afternoon."

11:30 Dashboard outlier investigation

Investigate why dashboard outliers are occurring. I will investigate by hitting the query earnestly. After investigating for about 30 minutes, it turned out that the number was 0 only for a specific version of a specific service. Immediately after confirming with the relevant department, it was said that the specifications of log data had changed recently. Since the cause was identified, the response policy was summarized and reported to the manager. "I am relieved because the cause can be identified and there is a prospect of recovery in the afternoon."

If you do not perform such maintenance and leave the wrong numbers appearing, the dashboard may lose credibility and gradually become invisible to members of the company. Therefore, it is necessary to pay attention to the maintenance and operation of the dashboard.

12:30 Start creating data set for analysis

Dataset creation, which is today's main task. スクリーンショット 2020-06-18 13.44.26.png The analysis data set created this time aggregates the usage status of a certain service and is useful for improvement.

The work content is just SQL aggregation from BigQuery, but since there are about 20 aggregation items, it is a very heavy task. If you make a mistake in this tabulation, it will affect the subsequent analysis, so proceed carefully. There are many things to think about when it comes to tabulating usage. What is the definition of use? Is the aggregation per user? Device unit? time limit? What if I reset it in the middle?

After carefully checking each item and making a plan for counting, we took a lunch break.

14:00 noon

15:00 Meeting for data policy planning

A meeting within the data team to think about measures to improve service usage. スクリーンショット 2020-06-18 13.39.56.png

Discuss the hypothesis and the image of measures. ・ Identification of user groups whose usage status is not good ・ Identification of timing and factors that significantly reduce usage ・ Organize the feasibility, issues, and schedules of measures for improvement

I will propose the summary to the business side next week.

16:00 Data set creation for analysis

I made an aggregation plan in the morning, so I just write SQL. スクリーンショット 2020-06-18 13.47.19.png

The more complicated the aggregation content, the more carefully it is necessary to do it one by one, such as whether the aggregation logic is appropriate, whether the aggregated numerical values are correct, and whether there are any omissions in the verification method.

18:00 Dataset review

We plan to have the team leader review it, and if there are no problems, we plan to visualize the aggregated usage status on the dashboard using BI tools. This time, I found an omission in the aggregation logic in one place, so I corrected it on the spot. When I checked the raw log, it was because I could not consider the irregular pattern. Even if it's a review, it's a super-remorse for the tabulator because it's out when it wasn't tabulated correctly. .. .. It's the same as the restaurant chef can't make it according to the recipe. If you compare the case of a fatal miscount to cooking, I'm working by telling my heart that the place where sugar should be added to make sweets is at the same level as adding salt by mistake (honestly, that's it) I haven't).

19:00 Updated task management tool for business progress.

The progress of work is visualized by using a task management tool so that it can be shared within the team. For people like me from other industries, it's a very epoch-making system. When I was still in the education industry three years ago, I miss managing it with a huge whiteboard.

19:15 Business end

Tomorrow, I plan to analyze it with python and pandas based on the dataset created today.

Summary

When I cut out a day like this, I often get busy with the work in front of me, but in my case, the data department is required by the company. ** ① Use data to create value ** ** ② Fostering a data-driven culture ** I think it is.

If you have the image of finding fun there, I think you can work in the data department with a sense of satisfaction.

As mentioned above, I hope that it will be helpful for those who are trying to become a data analyst.

Recommended Posts

[Unexpectedly known? ] Introducing a real day in the data analysis department
SE, a beginner in data analysis, learns with the data science unit vol.1
Data analysis in Python: A note about line_profiler
A well-prepared record of data analysis in Python
How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot
[Understand in the shortest time] Python basics for data analysis
Find the eigenvalues of a real symmetric matrix in Python
A simple data analysis of Bitcoin provided by CoinMetrics in Python
The first time a programming beginner tried simple data analysis by programming
Read the config file in Go language! Introducing a simple sample
Prepare a high-speed analysis environment by hitting mysql from the data analysis environment
Understanding the Tensor (3): Real World Data
Check the data summary in CASTable
Instantly illustrate the predominant period in time series data using spectrum analysis
Get a datetime instance at any time of the day in Python