[PYTHON] I tried using the frequently used seaborn method with as few arguments as possible [for beginners]

Purpose

When learning Python and implementing it

Do you have any experience?

I lacked knowledge, and every time I processed the data, I checked the arguments of the seaborn method and set them.

In this article, I will explain the frequent methods of seaborn that even beginners need to understand, and the minimum arguments required for quick confirmation.

Rough flow

Notes

(If you have any other parameters that you should set if possible, I would appreciate it if you could comment.)

Target person

I am writing for such people.

What you can understand in this article

When you started data analysis, such as Kaggle tutorials "What kind of data is set?" Regarding this point, I think it is possible to suppress the basic outline of seaborn for self-visualization.

environment

Premise

Graph drawing

Heat map

Official site: heatmap

By using a heat map, the strength of numerical data is visualized in color in a format like a brute force table.

This time, we will draw a graph focusing on items whose correlation coefficient (absolute value) with respect to the selling price of the house (Sale Price) is greater than 0.5.

import seaborn as sns

#Get the correlation coefficient
corr_mat = house_price.corr()

#Correlation coefficient with Sale Price(Absolute value)But 0.Create a DataFrame narrowed down to items larger than 5
top_corr_features = corr_mat.index[abs(corr_mat["SalePrice"])>0.5]


# import matplotlib.pyplot as plt
# plt.figure(figsize=(11,11)) #If you do something like that, you can display it more beautifully.

#Check the correlation
sns.heatmap(data=house_price[top_corr_features].corr(),annot=True,cmap="RdYlGn")

The contents of the arguments are as follows.

argument Contents
data Target data
annot Whether to display the value in the matrix
cmap Color type

From the point of view of whether it works, you can exclude the arguments of announce and cmap. However, from the perspective of visualizing the features,

From that point, I think it is better to set this parameter at the minimum.

The results are as follows. heatmap_simle2.png

It is a brute force table focusing only on items with SalePrice greater than 0.5. The reason for narrowing down the data items this time is that there are about 80 data items, and if you make a brute force table of this, there is no merit to visualize it, so we are reducing the number of items.

In this example, we can see that green has a stronger correlation, but we can see that the following two are particularly highly correlated.

item Correlation coefficient
OverallQual(Comprehensive evaluation of housing) 0.79
GrLivArea(Living area) 0.71

In this way, heatmaps can be used to determine the correlation of the data being analyzed.

Supplement

Bar plot

Shows the mean and error bars in a barplot bar chart.

[Official site: barplot] (http://seaborn.pydata.org/generated/seaborn.barplot.html)

The features of "Overall Qual", which had the highest correlation on the heat map, are displayed.

import seaborn as sns
sns.set()

sns.barplot(x=house_price.OverallQual,y=house_price.SalePrice)![distplot.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/310367/f788ef75-d123-fb53-b721-ed7a329b2de6.png)

The results are as follows. barplot.png

The horizontal axis is Overall Qual (comprehensive evaluation of the house), and the vertical axis is House Price (house price).

You can see that the better the overall rating of a house, the higher the house price.

Bar graph (count plot)

Shows the number of categorical variables in a countplot bar chart.

import seaborn as sns
sns.set()

sns.countplot(x='MSSubClass', data=house_price)

The results are as follows. countplot.png

The horizontal axis shows House Price, and the vertical axis shows the number of cases.

Histogram

Divides the data into sections and displays which sections are more numerous.

import seaborn as sns
sns.set()

sns.distplot(house_price['SalePrice'])

The results are as follows. distplot.png

Official site: distplot

Scatter plot

It is a simple scatter plot.

import seaborn as sns
sns.set()

sns.scatterplot(data=house_price, x='GrLivArea', y='SalePrice')

The results are as follows. scatterplot.png

Official site: scatterplot

Multiple display of graphs (pair plot)

You can brute force the scatter plots at once instead of one by one. This is useful when you want to comprehensively check the correlation.

import seaborn as sns
sns.set()

#Display the graph by narrowing down the items that have a high correlation with SalePrice
sns.pairplot(data=house_price[top_corr_features])

The results are as follows. pairplot.png

Official site: pairplot

Finally

Reference site

Based on the following sites, I proceeded with learning while checking the official site.

This article is really minimal, so if you want to learn more, please take a look.

Recommended Posts

I tried using the frequently used seaborn method with as few arguments as possible [for beginners]
[For beginners] I tried using the Tensorflow Object Detection API
I tried to summarize the frequently used implementation method of pytest-mock
I tried running the TensorFlow tutorial with comments (_TensorFlow_2_0_Introduction for beginners)
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
I tried using scrapy for the first time
vprof --I tried using the profiler for Python
I tried to implement merge sort in Python with as few lines as possible
I tried the MNIST tutorial for beginners of tensorflow.
I tried clustering ECG data using the K-Shape method
I tried using the Python library from Ruby with PyCall
I tried using the DS18B20 temperature sensor with Raspberry Pi
Miscellaneous notes that I tried using python for the matter
I tried to solve the ant book beginner's edition with python
I tried using the python module Kwant for quantum transport calculation
[Pandas] I tried to analyze sales data with Python [For beginners]
Frequently used Linux commands (for beginners)
I tried using the checkio API
I tried logistic regression analysis for the first time using Titanic data
I tried using "Streamlit" which can do the Web only with Python
[Text classification] I tried using the Attention mechanism for Convolutional Neural Networks.
I tried using Amazon SQS with django-celery
I tried tensorflow for the first time
I tried using Selenium with Headless chrome
I tried using Selective search as R-CNN
I tried playing with the image with Pillow
I tried using the BigQuery Storage API
I tried to get the information of the .aspx site that is paging using Selenium IDE as non-programming as possible.
I tried Hello World with 64bit OS + C language without using the library
I tried to explain what a Python generator is for as easily as possible.
I tried to explain multiple regression analysis as easily as possible using concrete examples.
■ Kaggle Practice for Beginners -House Sale Price (I tried using PyCaret)-by Google Colaboratory
I tried to summarize the operations that are likely to be used with numpy-stl
[Python] I tried the same calculation as LSTM predict with from scratch [Keras]
A memorandum of method often used when analyzing data with pandas (for beginners)
A memorandum of method often used in machine learning using scikit-learn (for beginners)
I tried searching for files under the folder with Python by file name
I tried "smoothing" the image with Python + OpenCV
I checked the library for using the Gracenote API
[Python] I tried substituting the function name for the function name
I tried "differentiating" the image with Python + OpenCV
I tried to save the data with discord
I tried the least squares method in Python
I tried using PyCaret at the fastest speed
I tried using the Google Cloud Vision API
I played with Floydhub for the time being
I tried python programming for the first time.
I tried using mecab with python2.7, ruby2.3, php7
I tried "binarizing" the image with Python + OpenCV
I tried using the Datetime module by Python
I tried Mind Meld for the first time
I tried using firebase for Django's cache server
I tried using the image filter of OpenCV
I tried DBM with Pylearn 2 using artificial data
I tried using the functional programming library toolz
I tried using a database (sqlite3) with kivy
I tried playing with the calculator on tkinter
[For beginners in competition professionals] I tried to solve 40 AOJ "ITP I" questions with python
[MQTT] I tried talking with the device using AWS IoT Core and Soracom Beam