Dieser Artikel ist ein Artikel, den ich mit Jupyter Lab tatsächlich in Kame (@usdatascientist) 's Blog (https://datawokagaku.com/python_for_ds_summary/) beschrieben habe.
Zusammenfassung der Grundoperationen von Pandas
10 ..
import pandas as pd
import numpy as np
Series
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
name John
sex male
age 22
dtype: object
array = np.array([10,20,30])
pd.Series(array)
0 10
1 20
2 30
dtype: int64
array = np.array([10,20,30])
labels = ['a','b','c']
pd.Series(array, labels)
a 10
b 20
c 30
dtype: int64
11 ..
So erstellen Sie einen DataFrame
Machen Sie aus Ndarray
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
print(john_s['age'])
name John
sex male
age 22
dtype: object
22
ndarray = np.random.randint(5, size=(5,4))
pd.DataFrame(data=ndarray)
|
0 |
1 |
2 |
3 |
0 |
1 |
1 |
1 |
0 |
1 |
4 |
1 |
0 |
0 |
2 |
3 |
2 |
1 |
0 |
3 |
3 |
1 |
1 |
3 |
4 |
4 |
0 |
1 |
3 |
columns = ['a','b','c','d']
index = np.arange(0,50,10)
pd.DataFrame(data=ndarray, index=index, columns=columns)
|
a |
b |
c |
d |
0 |
1 |
1 |
1 |
0 |
10 |
4 |
1 |
0 |
0 |
20 |
3 |
2 |
1 |
0 |
30 |
3 |
1 |
1 |
3 |
40 |
4 |
0 |
1 |
3 |
Aus Wörterbuch machen
data1 = {
'name':'John',
'sex':'male',
'age':22
}
data2 = {
'name':'Zack',
'sex':'male',
'age':30
}
data3 ={
'name':'Emily',
'sex':'female',
'age':32
}
pd.DataFrame([data1, data2, data3])
|
name |
sex |
age |
0 |
John |
male |
22 |
1 |
Zack |
male |
30 |
2 |
Emily |
female |
32 |
df = pd.read_csv('train.csv')
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
12 ..
Zeigen Sie die ersten 5 Zeilen mit .head () an.
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Überprüfen Sie die Statistiken mit .describe ()
df.describe()
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
891.000000 |
891.000000 |
891.000000 |
714.000000 |
891.000000 |
891.000000 |
891.000000 |
mean |
446.000000 |
0.383838 |
2.308642 |
29.699118 |
0.523008 |
0.381594 |
32.204208 |
std |
257.353842 |
0.486592 |
0.836071 |
14.526497 |
1.102743 |
0.806057 |
49.693429 |
min |
1.000000 |
0.000000 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
223.500000 |
0.000000 |
2.000000 |
20.125000 |
0.000000 |
0.000000 |
7.910400 |
50% |
446.000000 |
0.000000 |
3.000000 |
28.000000 |
0.000000 |
0.000000 |
14.454200 |
75% |
668.500000 |
1.000000 |
3.000000 |
38.000000 |
1.000000 |
0.000000 |
31.000000 |
max |
891.000000 |
1.000000 |
3.000000 |
80.000000 |
8.000000 |
6.000000 |
512.329200 |
type(df.describe()) #Typ ist DataFrame
pandas.core.frame.DataFrame
Liste der Spalten in .columns anzeigen
df.columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
type(df.columns) #Typ ist Index
pandas.core.indexes.base.Index
df.index #Es gibt auch einen Index.
RangeIndex(start=0, stop=891, step=1)
Holen Sie sich die Serie mit einer bestimmten Spalte, die von der Klammer [] umschlossen wird.
df['Age'].head()
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
Name: Age, dtype: float64
type(df['Age'])
pandas.core.series.Series
Fügen Sie eine Liste der Spalten in die Klammer [] ein und extrahieren Sie mehrere Spalten gleichzeitig
df[['Age','Parch','Fare']].head()
|
Age |
Parch |
Fare |
0 |
22.0 |
0 |
7.2500 |
1 |
38.0 |
0 |
71.2833 |
2 |
26.0 |
0 |
7.9250 |
3 |
35.0 |
0 |
53.1000 |
4 |
35.0 |
0 |
8.0500 |
Holen Sie sich eine bestimmte Zeile in Serie mit .iloc [int]
df.iloc[888] #index location
PassengerId 889
Survived 0
Pclass 3
Name Johnston, Miss. Catherine Helen "Carrie"
Sex female
Age NaN
SibSp 1
Parch 2
Ticket W./C. 6607
Fare 23.45
Cabin NaN
Embarked S
Name: 888, dtype: object
df.iloc[888]['Age']
nan
np.isnan(df.iloc[888]['Age'])
True
np.random.seed(1)
ndarray = np.random.randint(10, size=(5,5))
columns = [0,1,2,3,4]
index = ['a','b','c','d','e']
df_1 = pd.DataFrame(data=ndarray, index=index, columns=columns)
df_1
|
0 |
1 |
2 |
3 |
4 |
a |
5 |
8 |
9 |
5 |
0 |
b |
0 |
1 |
7 |
6 |
9 |
c |
2 |
4 |
5 |
2 |
4 |
d |
2 |
4 |
7 |
7 |
9 |
e |
1 |
7 |
0 |
6 |
9 |
df_1[0]
a 5
b 0
c 2
d 2
e 1
Name: 0, dtype: int64
df_1.loc['c'] #Wenn die Zeile nicht int ist['str']Zu.
0 2
1 4
2 5
3 2
4 4
Name: c, dtype: int64
Löschen Sie bestimmte Zeilen und Spalten mit Slicing
Drop-Index = 0 (0. Spalte)
df.drop(0) .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
Lass die 'Alter'-Spalte fallen
df.drop('Age', axis=1) .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Übergeben Sie beim Löschen mehrerer Spalten die Liste als Argument .drop ([]). Durch Löschen wird die ursprüngliche df nicht geändert
df.drop(['Age','PassengerId'], axis=1) .head()
|
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
0 |
3 |
Allen, Mr. William Henry |
male |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
df.head()#Drop ändert nicht die ursprüngliche df
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Es gibt zwei Möglichkeiten, df zu überschreiben. Wenn Sie place = True festlegen, wird der ursprüngliche DataFrame geändert
df = pd.read_csv('train.csv')
df.drop(['Age', 'Cabin'], axis=1, inplace=True)
df .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
df = pd.read_csv('train.csv')
df = df.drop(['Age', 'Cabin'], axis=1)
id(df)
140285150057616
Holen Sie sich mehrere Zeilen mit Schneiden
df.iloc[5:10]
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Embarked |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
0 |
0 |
330877 |
8.4583 |
Q |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
0 |
0 |
17463 |
51.8625 |
S |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
3 |
1 |
349909 |
21.0750 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
0 |
2 |
347742 |
11.1333 |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
1 |
0 |
237736 |
30.0708 |
C |
13 ..
Filtern Sie den DataFrame nach bestimmten Bedingungen
df = pd.read_csv('train.csv')
df = df['Survived'] == 1#Überlebende filtern
df.head()
0 False
1 True
2 True
3 True
4 False
Name: Survived, dtype: bool
filter = df['Survived'] ==1 #Fügen Sie es in eine Variable namens filter ein
df = df[filter]
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
df = df[df['Survived'] ==1] #Dies ist häufiger
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
df[df['Survived'] ==1].describe() #Beschreiben Sie nur Überlebensdaten
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
342.000000 |
342.0 |
342.000000 |
290.000000 |
342.000000 |
342.000000 |
342.000000 |
mean |
444.368421 |
1.0 |
1.950292 |
28.343690 |
0.473684 |
0.464912 |
48.395408 |
std |
252.358840 |
0.0 |
0.863321 |
14.950952 |
0.708688 |
0.771712 |
66.596998 |
min |
2.000000 |
1.0 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
250.750000 |
1.0 |
1.000000 |
19.000000 |
0.000000 |
0.000000 |
12.475000 |
50% |
439.500000 |
1.0 |
2.000000 |
28.000000 |
0.000000 |
0.000000 |
26.000000 |
75% |
651.500000 |
1.0 |
3.000000 |
36.000000 |
1.000000 |
1.000000 |
57.000000 |
max |
890.000000 |
1.0 |
3.000000 |
80.000000 |
4.000000 |
5.000000 |
512.329200 |
df.describe() #Originale Daten
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
891.000000 |
891.000000 |
891.000000 |
714.000000 |
891.000000 |
891.000000 |
891.000000 |
mean |
446.000000 |
0.383838 |
2.308642 |
29.699118 |
0.523008 |
0.381594 |
32.204208 |
std |
257.353842 |
0.486592 |
0.836071 |
14.526497 |
1.102743 |
0.806057 |
49.693429 |
min |
1.000000 |
0.000000 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
223.500000 |
0.000000 |
2.000000 |
20.125000 |
0.000000 |
0.000000 |
7.910400 |
50% |
446.000000 |
0.000000 |
3.000000 |
28.000000 |
0.000000 |
0.000000 |
14.454200 |
75% |
668.500000 |
1.000000 |
3.000000 |
38.000000 |
1.000000 |
0.000000 |
31.000000 |
max |
891.000000 |
1.000000 |
3.000000 |
80.000000 |
8.000000 |
6.000000 |
512.329200 |
df[df['Age'] >= 60].describe() #'Age'>=Nur 60
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
mean |
455.807692 |
0.269231 |
1.538462 |
65.096154 |
0.230769 |
0.307692 |
43.467950 |
std |
240.078490 |
0.452344 |
0.811456 |
5.110811 |
0.429669 |
0.837579 |
51.269998 |
min |
34.000000 |
0.000000 |
1.000000 |
60.000000 |
0.000000 |
0.000000 |
6.237500 |
25% |
277.250000 |
0.000000 |
1.000000 |
61.250000 |
0.000000 |
0.000000 |
10.500000 |
50% |
489.000000 |
0.000000 |
1.000000 |
63.500000 |
0.000000 |
0.000000 |
28.275000 |
75% |
629.750000 |
0.750000 |
2.000000 |
69.000000 |
0.000000 |
0.000000 |
58.860450 |
max |
852.000000 |
1.000000 |
3.000000 |
80.000000 |
1.000000 |
4.000000 |
263.000000 |
df[(df['Age']>=60) & (df['Sex']=='female')] #Daten nur für Frauen über 60 Jahre
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
275 |
276 |
1 |
1 |
Andrews, Miss. Kornelia Theodosia |
female |
63.0 |
1 |
0 |
13502 |
77.9583 |
D7 |
S |
366 |
367 |
1 |
1 |
Warren, Mrs. Frank Manley (Anna Sophia Atkinson) |
female |
60.0 |
1 |
0 |
110813 |
75.2500 |
D37 |
C |
483 |
484 |
1 |
3 |
Turkula, Mrs. (Hedwig) |
female |
63.0 |
0 |
0 |
4134 |
9.5875 |
NaN |
S |
829 |
830 |
1 |
1 |
Stone, Mrs. George Nelson (Martha Evelyn) |
female |
62.0 |
0 |
0 |
113572 |
80.0000 |
B28 |
NaN |
df[(df['Pclass']==1) | (df['Age']<10)] #Daten nur für die 1. Klasse oder unter 10 Jahren
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
Wenn ~ (Kringel) hinzugefügt wird, kann es durch NOT-Operation gefiltert werden.
data =[{'Name':'John', 'Survived':True},
{'Name':'Emily', 'Survived':False},
{'Name':'Ben', 'Survived':True}]
df = pd.DataFrame(data)
df
|
Name |
Survived |
0 |
John |
True |
1 |
Emily |
False |
2 |
Ben |
True |
Es wird häufig verwendet, wenn nach einer Spalte gefiltert wird, deren Wert boolesch ist.
df[df['Survived']==True]
|
Name |
Survived |
0 |
John |
True |
2 |
Ben |
True |
Da die Spalte Survived bereits Boolean ist, benötigen Sie == True nicht. Da df ['Survived'] bereits eine Boolean-Reihe ist, können Sie sie einfach wie links gezeigt filtern.
df[df['Survived']]
|
Name |
Survived |
0 |
John |
True |
2 |
Ben |
True |
Wenn Sie auf Survived == False eingrenzen möchten, können Sie Folgendes tun, ohne df [df ['Survived' == False] ausführen zu müssen.
df[~df['Survived']]
|
Name |
Survived |
1 |
Emily |
False |
Index ändern
Index mit .reset_index () neu zuweisen
df = pd.read_csv('train.csv')
df = df[df['Sex']=='male']
df.head() #Index ist unzusammenhängend
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |
Indizes ausrichten
Wie bei .drop () wird das ursprüngliche df nicht überschrieben. Wenn Sie also das df aktualisieren möchten, weisen Sie es mit inplace = True oder df = df.reset_index () neu zu.
df.reset_index() .head()
|
index |
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
2 |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
3 |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
4 |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |
Verwenden Sie .set_index (), um eine bestimmte Spalte zu indizieren
Setzen Sie den Index auf "Name".
Wie bei .reset_index () können Sie die ursprüngliche df mit inplace = True überschreiben.
df.set_index('Name').head()
|
PassengerId |
Survived |
Pclass |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
Name |
|
|
|
|
|
|
|
|
|
|
|
Braund, Mr. Owen Harris |
1 |
0 |
3 |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
Allen, Mr. William Henry |
5 |
0 |
3 |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Moran, Mr. James |
6 |
0 |
3 |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
McCarthy, Mr. Timothy J |
7 |
0 |
1 |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
Palsson, Master. Gosta Leonard |
8 |
0 |
3 |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |