Principal component analysis with Power BI + Python

Introduction

Programming languages ​​such as Python and R have the advantage that they have abundant statistical analysis libraries and can use advanced statistical methods for free, but modifying the source code and operating from the command line are complicated. In some cases, Power BI visuals for statistical analysis are published in AppSource, but the features you want to use may not be implemented. You can also use Power BI's Python or R visuals. Since the plots are processed in a programming language, it takes a lot of time to create and adjust visuals that are easy to see. Here, using principal component analysis as an example, we will read data from a Power BI query, perform statistical analysis using Python in the query, and try visualization using the Power BI dashboard.

Sample data

[Start multivariate analysis and principal component analysis with Pokemon! Use pokemon.csv from The Complete Pokemon Dataset published on Kaggle, referring to the article "Linking R and Tableau" (https://qiita.com/bashiiiwa/items/d783150ff4299dda27f1).

Edit Power BI query

  1. Load the downloaded Pockemon.csv. image.png

  2. Delete columns that are not needed for analysis, and make them Name column, data column 1, data column 2 ...

image.png

  1. Add a Python script. In this script, the first column is Name and the second and subsequent columns are data, and principal component analysis is performed using the skikit-learn library. The Python code for principal component analysis is based on Meaningful Principal Component Analysis (https://qiita.com/NoriakiOshita/items/460247bb57c22973a5f0).

image.png

 'dataset'Holds the input data for this script

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
dataset2=dataset.drop(dataset.columns[0],axis=1)
X=dataset2.values
pca = PCA()
pca.fit(X)
pca_point = pca.transform(X)
dataset['PC1']=pca_point[:,0]
dataset['PC2']=pca_point[:,1]
evr=pd.DataFrame(data=pca.explained_variance_ratio_,  columns={'explained_variance_ratio'}, dtype='float')
evr['PC No.']=evr.index+1
components=pd.DataFrame(data=pca.components_,  columns=dataset2.columns, dtype='float')
components['PC No.']=components.index+1
del dataset2
  1. The contribution rate for each component is set in evr, and check the value. (It shows the influence of each component, the first principal component is 0.46, the second principal component is 0.19) image.png

  2. components contain eigenvectors. (Indicates how much each component is weighted to the data string and is a reference value for interpreting the main components) image.png

  3. Principal component 1 (PC1) and principal component 2 (PC2) of the principal component analysis result are added to the dataset, and these are read.

image.png

Creating a dashboard

  1. Plot the X-axis PC1 and Y-axis PC2 on the scatter plot. (Display data label in category On)

image.png

  1. Create a tooltip page so that you can see the value of each field when you hover over it. Here, place multiple row cards. (Turn on tooltips, set the page size of the format on the page to tooltips) image.png

  2. Enable tooltips for scatter plots (turn on tooltips in formatting, type: report page, set to Hints that created the page). Hover your mouse cursor over a data point to see character characteristic data.

image.png

result of analysis

Let's compare it with Pokemon Data Analysis.py. On the eigenvector, the first principal component (PC1) contributes more in the order of Sp. Atk (special attack power) and Attack (attack power), so if you try plotting Sp. Atk in the size of a circle, Certainly, it tends to increase toward the right side of the first principal component (PC1). : slight_smile:

image.png

Recommended Posts

Principal component analysis with Power BI + Python
Challenge principal component analysis of text data with Python
Principal component analysis using python from nim with nimpy
Principal component analysis with Spark ML
Python: Unsupervised Learning: Principal Component Analysis
Principal component analysis
Principal Component Analysis with Livedoor News Corpus-Practice-
Data analysis with python 2
Voice analysis with python
Principal component analysis (Principal component analysis: PCA)
Voice analysis with python
Data analysis with Python
Principal component analysis with Livedoor News Corpus --Preparation--
Dimensional compression with self-encoder and principal component analysis
I tried principal component analysis with Titanic data!
PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
Robot grip position (Python PCA principal component analysis)
[Python] Morphological analysis with MeCab
Collaborative filtering with principal component analysis and K-means clustering
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Unsupervised learning 3 Principal component analysis
Planar skeleton analysis with Python
Japanese morphological analysis with Python
Principal component analysis (PCA) and independent component analysis (ICA) in python
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
Muscle jerk analysis with Python
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Visualize the correlation matrix by principal component analysis in Python
Principal component analysis hands-on with PyCaret [normalization + visualization (plotly)] memo
Impedance analysis (EIS) with python [impedance.py]
Face recognition using principal component analysis
Data analysis starting with python (data visualization 1)
Logistic regression analysis Self-made with python
Data analysis starting with python (data visualization 2)
I tried the same data analysis with kaggle notebook (python) and Power BI at the same time ②
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
[In-Database Python Analysis Tutorial with SQL Server 2017]
Marketing analysis with Python ① Customer analysis (decyl analysis, RFM analysis)
Two-dimensional saturated-unsaturated osmotic flow analysis with Python
Machine learning with python (2) Simple regression analysis
Operate smartlife power supply with python (de-IFTTT)
Automatically update Power BI Desktop in Python
Tweet analysis with Python, Mecab and CaboCha
<Course> Machine learning Chapter 4: Principal component analysis
Data analysis starting with python (data preprocessing-machine learning)
Two-dimensional unsteady heat conduction analysis with Python
Python: Simplified morphological analysis with regular expressions
Recognize the contour and direction of a shaped object with OpenCV3 and Python3 (Principal component analysis: PCA, eigenvectors)
FizzBuzz with Python3
Scraping with Python
Statistics with python
Scraping with Python
Data analysis python
[Various image analysis with plotly] Dynamic visualization with plotly [python, image]
[Python] Comparison of Principal Component Analysis Theory and Implementation by Python (PCA, Kernel PCA, 2DPCA)
Twilio with Python
Medical image analysis with Python 1 (Read MRI image with SimpleITK)
Integrate with Python
Play with 2016-Python
AES256 with python
python starts with ()