[PYTHON] Visualize by adding "a bite" to the "boxplot" (boxen / swarm / violin)
Overview
- This is a memo of what I thought about the data visualization of kaggle's titanic.
- I want to visualize the distribution of "passenger age" for each "port on board".
- In such cases, it is common to use a boxplot (boxplot in seaborn).
- On the other hand, ** other ** visualization means can be used to add "a bite", so I summarized them.
- This time, as an alternative to boxplot of seaborn I would like to consider the following areas.
- boxenplot
- swarmplot
- violinplot
- I hope it helps someone, but it's just a work memo & personal opinion.
motivation
Boxplot
- At Titanic, the age of passengers at each port of embarkation looks like this. (First, boxplot)
- For the time being, the following can be read.
- The median age is around 25 to 30 years old, no matter which port you board from.
- There is no big difference in the median and the first and third quantiles. (Queenstown is a little younger?)
- Outliers (data for older people) are noticeable for passengers boarding from Southampton![Download.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/ 183826 / 6c03f3ea-bd76-a621-d64d-ecd52849062e.png)
If you try Swarmplot
- If you try to make this a Swarmplot, the quartile value will be hard to see, but it will be nice to add a "smell".
- You will be able to be aware of the ** number of data for each series. (Actually, ** Queenstown is a small number **)
- Easy to read even for those who ** do not know the meaning of boxes and beards **
- Easy to read ** dense and sparse parts ** of data![Download.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/183826/ a56c0cee-8ac2-8617-c574-9d5d3b07f237.png)
Add "a bite" to the boxplot
Try changing functions and options
If you do etc., you can add "a bite"
Putting it all together (cheat sheet)
- boxenplot does not have a split option **
- Note that the meaning of ** split option ** is slightly different between swarmplot and violinplot.
option |
boxenplot |
swarmplot |
violinplot |
Not specified |
|
|
|
hue="Sex" |
|
|
|
hue="Sex" split=True |
None |
|
|
"Which" should be used "when"?
- It's hard to say "this is for this purpose!", But ...
- If you compare each, you can see the characteristics.
Boxplot vs boxenplot
- There is no difference because only 2 letters (en) are changed alphabetically.
Do you want to show it in ** ** quartile ** or in more ** finer quantiles **? Want to be aware of ** outliers **? Is the point
|
Boxplot(boxplot) |
boxenplot |
display |
|
|
Feature |
Quartile, Maximum, minimum You can also see the situation of outliers |
ThanFine quantileCan be seen Hard to see as outliers |
Boxplot vs swarmplot
- Compared to boxplot, swarmplot that is conscious of individual data and captures it ** continuously **
- You can see the ** number, density, and difference ** of the data, but the plot ** cost is high ** and it is difficult for a large amount of data.
|
Boxplot(boxplot) |
swarmplot |
display |
|
|
Feature |
section(Quantile)To catch as plotLow cost |
Awareness of the individualAnd continuously capture the data dataDifferences by number and seriesCan be understood しかし、plotHigh cost |
swarmplot vs violinplot
- Like swarmplot, violin plot ** handles data continuously ** and plots ** costs can be reduced **
- Instead, the number of data and the difference between series ** become unaware. ** **
|
swarmplot |
violinplot |
display |
|
|
Feature |
Awareness of the individualAnd continuously capture the data dataDifferences by number and seriesCan be understood But the plotHigh cost |
Awareness of the individualせず、dataのI can't see the numberBut, Continuous understanding of overall trends plotKeep costs downCan do things. |
Summary
- There are advantages and disadvantages, and it should be selected according to the application, but in summary, is it as follows?
Interval vs continuous |
How to add "Hitomi" |
What visualization method should I choose? |
Data**section (Quantile)**Treated with |
OutliersIf you want to be aware of |
Boxplot(boxplot) |
|
From the quartileDetailedIn the display, |
boxenplot |
DataContinuouslyHandle, |
ThatNumber and densityIf you want to show |
swarmplot |
|
Keep plot costs down Overall trendIf you want to show |
violinplot |