Wir werden die folgenden Daten verwenden.

x = [i for i in range(1,11)]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Ich möchte eine kumulative Verteilungsfunktion für die in dieser Variablen x enthaltenen Werte erstellen.

Bei Verwendung der Cumsum-Funktion von Pandas

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


x = [i for i in range(1,11)]

df = pd.DataFrame(x, columns=['x'])
df["cumsum"] = df.x.cumsum() #Kumulative Summe hinzufügen
df["cumsum_ratio"] = df.x.cumsum()/sum(df.x) #Wahrscheinlichkeit, den Wert von Cumsum zu erreichen

Infolgedessen hat df die folgende Struktur. (Index wird nicht angezeigt)

x	cumsum	cumsum_ratio
1	1	0.018182
2	3	0.054545
3	6	0.109091
4	10	0.181818
5	15	0.272727
...	...	...

Sie können dies zeichnen.

fig, ax = plt.subplots(figsize=(4, 4))
ax.set_xlabel('Value')
ax.set_ylabel('Cumulative Frequency') 
ax.set_xlim(0,10)
ax.scatter(df.x, df.cumsum_ratio, color="blue",s=10) 
ax.plot(df.x, df.cumsum_ratio, color="blue", marker='o',markersize=1)

aaa

Bei Verwendung der Funktion "stats.cumfreq" von scipy

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.cumfreq.html

Dies ist keine kumulative Verteilungsfunktion, kann jedoch wie folgt verwendet werden.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

x = [i for i in range(1,11)]

res = stats.cumfreq(x, numbins=10)
x_ = res.lowerlimit + np.linspace(0, res.binsize*res.cumcount.size, res.cumcount.size)


x_1 = np.arange(counts.size) * binsize + start 

fig, ax = plt.subplots(figsize=(4, 4))
ax.plot(x_, res.cumcount, 'ro')
ax.set_title('Cumulative histogram')
ax.set_xlim([x_.min(), x_.max()])

hogehoge