[PYTHON] Aggregation and visualization of accumulated numbers

We will use the following data.

x = [i for i in range(1,11)]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I want to create a cumulative distribution function of the values contained in this variable x.

When using the pandas cumsum function

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


x = [i for i in range(1,11)]

df = pd.DataFrame(x, columns=['x'])
df["cumsum"] = df.x.cumsum() #Add cumulative sum
df["cumsum_ratio"] = df.x.cumsum()/sum(df.x) #Probability to reach the value of cumsum

As a result, df has the following structure. (Index is not displayed)

x cumsum cumsum_ratio
1 1 0.018182
2 3 0.054545
3 6 0.109091
4 10 0.181818
5 15 0.272727
... ... ...

You can draw this.

fig, ax = plt.subplots(figsize=(4, 4))
ax.set_xlabel('Value')
ax.set_ylabel('Cumulative Frequency') 
ax.set_xlim(0,10)
ax.scatter(df.x, df.cumsum_ratio, color="blue",s=10) 
ax.plot(df.x, df.cumsum_ratio, color="blue", marker='o',markersize=1) 

aaa

When using scipy's stats.cumfreq function

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cumfreq.html

This is not a cumulative distribution function, but it can be used as follows.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

x = [i for i in range(1,11)]

res = stats.cumfreq(x, numbins=10)
x_ = res.lowerlimit + np.linspace(0, res.binsize*res.cumcount.size, res.cumcount.size)


x_1 = np.arange(counts.size) * binsize + start 

fig, ax = plt.subplots(figsize=(4, 4))
ax.plot(x_, res.cumcount, 'ro')
ax.set_title('Cumulative histogram')
ax.set_xlim([x_.min(), x_.max()])

hogehoge

Recommended Posts

Aggregation and visualization of accumulated numbers
Correlation visualization of features and objective variables
Visualization of CNN feature maps and filters (Tensorflow 2.0)
Prime numbers and divisors
Analysis of financial data by pandas and its visualization (2)
Analysis of financial data by pandas and its visualization (1)
Discrimination of prime numbers
Overview and tips of seaborn with statistical data visualization
[Control engineering] Visualization and analysis of PID control and step response
Visualization of the connection between malware and the callback server
Visualization method of data by explanatory variable and objective variable
Negative / positive judgment of sentences and visualization of grounds by Transformer
Using MLflow with Databricks ② --Visualization of experimental parameters and metrics -
Negative / positive judgment of sentences by BERT and visualization of grounds
Is the lottery profitable? ~ LOTO7 and the law of large numbers ~
Problems of liars and honesty
Understand t-SNE and improve visualization
Mechanism of pyenv and virtualenv
Pre-processing and post-processing of pytest
Combination of recursion and generator
Combination of anyenv and direnv
Visualization of data by prefecture
Differentiation of sort and generalization of sort
Coexistence of pyenv and autojump
Use and integration of "Shodan"
Problems of liars and honesty
Occurrence and resolution of tensorflow.python.framework.errors_impl.FailedPreconditionError
Visualization of possessed skills [continuation]
Comparison of Apex and Lamvery
Source installation and installation of Python
Introduction and tips of mlflow.Tracking