I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ②

background

Last post [I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ①] Same as (https://qiita.com/khigashimoto/items/80f0858b59379d461d97).

Last time, I tried the following environment ①, but this time I will try environment ②. --Environment ① Data analysis using Python + Numpy + Pandas + α --Environment (2) Data analysis using BI tools (Business Intelligence tools)

System configuration

Common environment

PC used: Surface LTE Core i5-7300U, memory 8GB PC OS used: Microsoft Windows10 Pro 64bit Browser used: Microsoft Edge

Environment ② "Data analysis by BI tool" Operating environment

-Usage environment: Microsoft Power BI Desktop Ver 2.84.802.0 64-bit You can get Power BI Desktop from the Microsoft Store. Most of the functions including this work can be used free of charge. image.png

What I did (summary)

As before, I will borrow a part of Udemy's data science course below. [180,000 people in the world] Practical Python data science

This time, we will use BI tools to perform the following analysis related to the famous "Titanic sinking" as the first step in data analysis.

・ What kind of people were the passengers of the Titanic? (Gender, age, etc.) ・ Causal relationship between the above-mentioned characteristics and their complex relationship and survival rate

What I did (details) * BI Tools (Power BI)

    1. Get data locally (own PC) Acquire customer data at the time of Titanic sinking as CSV data from kaggle. I will omit it because it is the same as the previous article, but when you open the actual customer data in Excel, it looks like this. image.png
  1. Bring local data into the environment and display a summary Start PowerBI Desktop and import the saved CSV file.

-From the main screen after startup, select "Get data icon" ⇒ "Text / CSV" image.png

・ The preview screen looks like this. Press the "Load" button image.png

-When you return to the main screen, nothing is displayed, so press the "data screen icon". image.png

・ You can see the read data. image.png

Since the Power BI desktop is a GUI-based application (of course ...), unlike a CUI-based environment such as Python, you can immediately get a bird's-eye view of the data status on the screen itself.

    1. Check the ratio of men and women ・ Press the "report screen icon" to go to the report screen image.png

・ "Try using a set bar graph. When you press the icon, the graph will be displayed on the upper left screen. image.png

-Drag and drop the item called Sex on the right side of the screen to the center axis and value. image.png

It's nice to be able to work graphically with this kind of usability like Excel. As an aside, I felt a little light while doing this work, I checked with the task manager below. Since it will be a desktop application, I felt that it would be more comfortable to have about 16GB of memory for a Win10 PC. image.png

Four. Check the ratio of men and women for each room grade The item P class indicates the grade of the room. I could easily do the same with Power BI around here.

・ Similarly, select the set bar graph. Drag and drop the item "Sex" to "Axis" and "Value" and the item Pclass to "Legend". image.png

It is also possible to easily reverse the axis. ・ Similarly, select the set bar graph. Drag and drop the item "Sex" to "Legend" and "Value" and the item Pclass to "Axis". image.png

Five. Create items (Person) "Men", "Women", "Children (under 16 years old)" using items "Age" and "Sex" First, create a Person column. I tried enthusiastically, but at the moment I have not been able to generate columns well ... We believe that there are two types of column insertion. ・ Query editor This is a dedicated function for processing the original data according to the purpose of analysis without changing the original data. However, as shown below, I mentioned that I would like to add a custom column from this function, but the result will be an error. image.png image.png

-Pure column addition In that case, next time I tried to add a column directly on the data screen, but unfortunately this also results in an error. image.png

Hmm. .. .. At the moment, this is the end of the verification. ..

In addition, the results of a series of analysis trials using Power BI are attached below. I feel that it is a good point of Power BI that you can list the results in this way and get a bird's-eye view side by side.

image.png

Summary (impression)

Here are my impressions of Power BI Desktop that I felt while doing the above work.


Benefits
・ The display method is rich (beautiful) -Since it is a GUI-based application, even people who do not write programs can easily use it. ・ Multiple results can be viewed side by side.


Disadvantages
-For processing such as missing value processing and data generation, it may be confusing when trying to perform complicated work. -Since there aren't many references to Power BI itself, you can't feel free to google when you're in trouble. -Since it is an application that uses a real PC, does it feel a little light depending on the PC specifications?

CaseA) Python that has many references and can create detailed flows for complicated analysis such as preprocessing and generation of additional data. CaseB) Power BI is a display mechanism for actual workers that uses light analysis and the results after analysis in Python. I think it's better to live separately and use it. I think at this point.

Recommended Posts

I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ②
I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ①
Plot multiple maps and data at the same time with Python's matplotlib
Visualize data and understand correlation at the same time
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I tried factor analysis with Titanic data!
Principal component analysis with Power BI + Python
I tried logistic regression analysis for the first time using Titanic data
I tried fMRI data analysis with python (Introduction to brain information decoding)
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
I tried "smoothing" the image with Python + OpenCV
I tried "differentiating" the image with Python + OpenCV
I tried to save the data with discord
I tried principal component analysis with Titanic data!
I tried Jacobian and partial differential with python
I tried to get CloudWatch data with Python
I tried python programming for the first time.
I tried function synthesis and curry with python
I tried "binarizing" the image with Python + OpenCV
I want to make a music player and file music at the same time
Turn multiple lists with a for statement at the same time in Python
[Python] I tried the same calculation as LSTM predict with from scratch [Keras]
Data analysis with python 2
Data analysis with Python
I tried to touch the CSV file with Python
I tried to predict the J-League match (data analysis)
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried to solve the soma cube with python
I tried python on heroku for the first time
This time I learned Python I and II at Progate.
Data analysis environment construction with Python (IPython notebook + Pandas)
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried hitting the API with echonest's python client
I tried to automate the article update of Livedoor blog with Python and selenium.
[New Corona] Is the next peak in December? I tried trend analysis with Python!
At this time, I tried various Python virtual environments such as Anaconda and pyenv.
[Challenger Wanted] The fastest Data Loading and Data Augmentation (Kaggle notebook) I can think of
I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.
I just wanted to extract the data of the desired date and time with Django
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to predict and submit Titanic survivors with Kaggle
I tried to find the entropy of the image with python
I tried fp-growth with python
I tried scraping with Python
I tried to simulate how the infection spreads with Python
I tried using the Python library from Ruby with PyCall
I tried to make various "dummy data" with Python faker
wxPython: Draw animation and graph drawing at the same time
How to write offline real time I tried to solve the problem of F02 with Python
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
I tried follow management with Twitter API and Python (easy)
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
[Understand in the shortest time] Python basics for data analysis
Solve the spiral book (algorithm and data structure) with python!
I tried to illustrate the time and time in C language
I tried programming the chi-square test in Python and Java.
I tried to display the time and today's weather w
I also tried to imitate the function monad and State monad with a generator in Python
[Python] I tried collecting data using the API of wikipedia
I tried to enumerate the differences between java and python