I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ①

background

Multiple services are provided in the streets as a data analysis tool. I work in the manufacturing industry, but I mainly hear the following two methods for data analysis and visualization.

--Environment ① Data analysis using Python + Numpy + Pandas + α --Environment (2) Data analysis using BI tools (Business Intelligence tools)

In comparing and examining the differences between these two tools, I actually tried it based on the idea that "I should try the same analysis with the two tools."

System configuration

Common environment

PC OS used: Microsoft Windows10 Pro 64bit Browser used: Microsoft Edge

Environment ① "Data analysis using Python + Numpy + Pandas + α" Operating environment

・ Usage environment: kaggle notebook Since it is a cloud service, the version cannot be confirmed and it will be as of August 24, 2020.

kaggle is a community and competition website for data analysts. There are also competitions with prize money, and it seems that engineers are competing for the accuracy of data analysis. The details are easy to understand at the link below. Kaggle Tutorial Part 1 What is Kaggle? What does it mean to participate? Also, I opened an account by referring to the link below. If you have an account, you can use data analysis related services including kaggle notebook free of charge. Introduction to Kaggle Beginners! From opening an account to submitting Titanic

Environment ② "Data analysis by BI tool" Operating environment

-Usage environment: Microsoft Power BI Desktop Ver 2.84.802.0 64-bit You can get Power BI Desktop from the Microsoft Store. Most of the functions including this work can be used free of charge. image.png

What I did (summary)

The data analysis stream borrowed part of Udemy's data science course. The following courses cover everything from basics to simple practical training, and are recommended for those who want to systematically learn data science. [180,000 people in the world] Practical Python Data Science

Taking over the above curriculum in udemy, I will perform the following analysis related to the famous "Titanic sinking" as the first step of data analysis by two methods.

・ What kind of people were the passengers of the Titanic? (Gender, age, etc.) ・ Causal relationship between the above-mentioned characteristics and their complex relationship and survival rate

What I did (details) * Python + Numpy + Pandas + α edition *

In this article, we will describe the processing common to the two tools and the operation results with the "Python + numpy + pandas + α" tool. The results of "Data analysis by BI tools" will be described in the next article.

    1. Get data locally (own PC) Acquire customer data at the time of Titanic sinking as CSV data from kaggle.

image.png

When you open the actual customer data in Excel, it looks like this. image.png

  1. Bring local data into the environment and display a summary Only the first 5 data are extracted to give a bird's-eye view of the data. image.png

    1. Check the ratio of men and women image.png

Four. Check the ratio of men and women for each room grade The item P class indicates the grade of the room. You can see that there are many men in the third-class guest rooms. image.png

It is also possible to easily reverse the axis. image.png

Five. Create items (Person) "Men", "Women", "Children (under 16 years old)" using items "Age" and "Sex"

First, create a Person column. image.png

Then, check the distribution of Persons by grade. Is the first-class room expensive? There are few children. In addition, it can be seen that the ratio of male adult males is high in the third-class guest rooms. Furthermore, since the ratio of children is high, there were many single men and families in the third-class rooms. I can imagine. image.png

The above is a simple analysis flow. Actually, from now on, it will be a flow to take a bird's-eye view of the above characteristics and surrounding characteristics and the causal relationship of survivors, but only the first stage is excerpted and described.

Summary (impression)

While doing the above work, my impression is

--Look at some actual data such as the beginning and capture the atmosphere of the entire data --Check the data summary to see if the data is corrupted & fix it --Create your own data to get the information you intended

I felt that it is an advantage of python-based notebooks such as kaggle that it is possible to proceed with analysis while "grabbing" a huge amount of data on various axes. (Of course, of course ...)

In the next article, I'll try the same thing with Microsoft's Power BI, a Bi tool.

Recommended Posts

I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ②
I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ①
Plot multiple maps and data at the same time with Python's matplotlib
Visualize data and understand correlation at the same time
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I tried factor analysis with Titanic data!
Principal component analysis with Power BI + Python
I tried fMRI data analysis with python (Introduction to brain information decoding)
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
I tried "smoothing" the image with Python + OpenCV
I tried "differentiating" the image with Python + OpenCV
I tried to save the data with discord
I tried principal component analysis with Titanic data!
I tried Jacobian and partial differential with python
I tried to get CloudWatch data with Python
I tried function synthesis and curry with python
I tried "binarizing" the image with Python + OpenCV
I want to make a music player and file music at the same time
Turn multiple lists with a for statement at the same time in Python
[Python] I tried the same calculation as LSTM predict with from scratch [Keras]
Data analysis with python 2
Data analysis with Python
I tried Python on Mac for the first time.
I tried to touch the CSV file with Python
I tried to predict the J-League match (data analysis)
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried to solve the soma cube with python
I tried python on heroku for the first time
This time I learned Python I and II at Progate.
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried hitting the API with echonest's python client
[New Corona] Is the next peak in December? I tried trend analysis with Python!
At this time, I tried various Python virtual environments such as Anaconda and pyenv.
[Challenger Wanted] The fastest Data Loading and Data Augmentation (Kaggle notebook) I can think of
I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.
I just wanted to extract the data of the desired date and time with Django
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to predict and submit Titanic survivors with Kaggle
I tried to find the entropy of the image with python
I tried scraping with Python
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to make various "dummy data" with Python faker
How to write offline real time I tried to solve the problem of F02 with Python
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
[Understand in the shortest time] Python basics for data analysis
Solve the spiral book (algorithm and data structure) with python!
I tried to illustrate the time and time in C language
I tried programming the chi-square test in Python and Java.
I tried to display the time and today's weather w
I also tried to imitate the function monad and State monad with a generator in Python
[Python] I tried collecting data using the API of wikipedia
I tried to enumerate the differences between java and python
I tried gRPC with Python
I passed the Python data analysis test, so I summarized the points
I tried scraping with python
I tried to make GUI tic-tac-toe with Python and Tkinter
I tried to divide the file into folders with Python
This time I learned python III and IV with Prorate