Cet article est un article dans lequel j'ai en fait codé le fonctionnement de base de Pandas décrit dans le blog de Kame (@usdatascientist) (https://datawokagaku.com/python_for_ds_summary/) en utilisant Jupyter Lab.
Résumé des opérations de base de Pandas
10e
import pandas as pd
import numpy as np
Series
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
name John
sex male
age 22
dtype: object
array = np.array([10,20,30])
pd.Series(array)
0 10
1 20
2 30
dtype: int64
array = np.array([10,20,30])
labels = ['a','b','c']
pd.Series(array, labels)
a 10
b 20
c 30
dtype: int64
11ème
Comment créer un DataFrame
Fabriquer à partir de ndarray
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
print(john_s['age'])
name John
sex male
age 22
dtype: object
22
ndarray = np.random.randint(5, size=(5,4))
pd.DataFrame(data=ndarray)
|
0 |
1 |
2 |
3 |
0 |
1 |
1 |
1 |
0 |
1 |
4 |
1 |
0 |
0 |
2 |
3 |
2 |
1 |
0 |
3 |
3 |
1 |
1 |
3 |
4 |
4 |
0 |
1 |
3 |
columns = ['a','b','c','d']
index = np.arange(0,50,10)
pd.DataFrame(data=ndarray, index=index, columns=columns)
|
a |
b |
c |
d |
0 |
1 |
1 |
1 |
0 |
10 |
4 |
1 |
0 |
0 |
20 |
3 |
2 |
1 |
0 |
30 |
3 |
1 |
1 |
3 |
40 |
4 |
0 |
1 |
3 |
Faire à partir du dictionnaire
data1 = {
'name':'John',
'sex':'male',
'age':22
}
data2 = {
'name':'Zack',
'sex':'male',
'age':30
}
data3 ={
'name':'Emily',
'sex':'female',
'age':32
}
pd.DataFrame([data1, data2, data3])
|
name |
sex |
age |
0 |
John |
male |
22 |
1 |
Zack |
male |
30 |
2 |
Emily |
female |
32 |
df = pd.read_csv('train.csv')
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
12ème
Afficher les 5 premières lignes avec .head ()
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Vérifier les statistiques avec .describe ()
df.describe()
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
891.000000 |
891.000000 |
891.000000 |
714.000000 |
891.000000 |
891.000000 |
891.000000 |
mean |
446.000000 |
0.383838 |
2.308642 |
29.699118 |
0.523008 |
0.381594 |
32.204208 |
std |
257.353842 |
0.486592 |
0.836071 |
14.526497 |
1.102743 |
0.806057 |
49.693429 |
min |
1.000000 |
0.000000 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
223.500000 |
0.000000 |
2.000000 |
20.125000 |
0.000000 |
0.000000 |
7.910400 |
50% |
446.000000 |
0.000000 |
3.000000 |
28.000000 |
0.000000 |
0.000000 |
14.454200 |
75% |
668.500000 |
1.000000 |
3.000000 |
38.000000 |
1.000000 |
0.000000 |
31.000000 |
max |
891.000000 |
1.000000 |
3.000000 |
80.000000 |
8.000000 |
6.000000 |
512.329200 |
type(df.describe()) #le type est DataFrame
pandas.core.frame.DataFrame
Afficher la liste des colonnes en .columns
df.columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
type(df.columns) #le type est index
pandas.core.indexes.base.Index
df.index #Il y a aussi un index.
RangeIndex(start=0, stop=891, step=1)
Obtenez la série avec une colonne spécifique englobée par le crochet [].
df['Age'].head()
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
Name: Age, dtype: float64
type(df['Age'])
pandas.core.series.Series
Mettez une liste de colonnes entre crochets [] et extrayez plusieurs colonnes à la fois
df[['Age','Parch','Fare']].head()
|
Age |
Parch |
Fare |
0 |
22.0 |
0 |
7.2500 |
1 |
38.0 |
0 |
71.2833 |
2 |
26.0 |
0 |
7.9250 |
3 |
35.0 |
0 |
53.1000 |
4 |
35.0 |
0 |
8.0500 |
Obtenez une ligne spécifique dans Series avec .iloc [int]
df.iloc[888] #index location
PassengerId 889
Survived 0
Pclass 3
Name Johnston, Miss. Catherine Helen "Carrie"
Sex female
Age NaN
SibSp 1
Parch 2
Ticket W./C. 6607
Fare 23.45
Cabin NaN
Embarked S
Name: 888, dtype: object
df.iloc[888]['Age']
nan
np.isnan(df.iloc[888]['Age'])
True
np.random.seed(1)
ndarray = np.random.randint(10, size=(5,5))
columns = [0,1,2,3,4]
index = ['a','b','c','d','e']
df_1 = pd.DataFrame(data=ndarray, index=index, columns=columns)
df_1
|
0 |
1 |
2 |
3 |
4 |
a |
5 |
8 |
9 |
5 |
0 |
b |
0 |
1 |
7 |
6 |
9 |
c |
2 |
4 |
5 |
2 |
4 |
d |
2 |
4 |
7 |
7 |
9 |
e |
1 |
7 |
0 |
6 |
9 |
df_1[0]
a 5
b 0
c 2
d 2
e 1
Name: 0, dtype: int64
df_1.loc['c'] #Lorsque la ligne n'est pas int['str']À.
0 2
1 4
2 5
3 2
4 4
Name: c, dtype: int64
Supprimer certaines lignes et colonnes avec le découpage
Drop index = 0 (0ème colonne)
df.drop(0) .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
Abandonnez la colonne «Âge»
df.drop('Age', axis=1) .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Lors de la suppression de plusieurs colonnes, transmettez la liste comme argument .drop ([]). Drop ne modifie pas le df d'origine
df.drop(['Age','PassengerId'], axis=1) .head()
|
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
0 |
3 |
Allen, Mr. William Henry |
male |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
df.head()#Drop ne modifie pas le df d'origine
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Il existe deux façons d'écraser df. Le paramètre place = True modifiera le DataFrame d'origine
df = pd.read_csv('train.csv')
df.drop(['Age', 'Cabin'], axis=1, inplace=True)
df .head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
df = pd.read_csv('train.csv')
df = df.drop(['Age', 'Cabin'], axis=1)
id(df)
140285150057616
Obtenez plusieurs lignes avec le tranchage
df.iloc[5:10]
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
SibSp |
Parch |
Ticket |
Fare |
Embarked |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
0 |
0 |
330877 |
8.4583 |
Q |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
0 |
0 |
17463 |
51.8625 |
S |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
3 |
1 |
349909 |
21.0750 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
0 |
2 |
347742 |
11.1333 |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
1 |
0 |
237736 |
30.0708 |
C |
13e
Filtrer le DataFrame selon des conditions spécifiques
df = pd.read_csv('train.csv')
df = df['Survived'] == 1#Filtrer les survivants
df.head()
0 False
1 True
2 True
3 True
4 False
Name: Survived, dtype: bool
filter = df['Survived'] ==1 #Mettez-le dans une variable appelée filtre
df = df[filter]
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
df = df[df['Survived'] ==1] #C'est plus courant
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
df[df['Survived'] ==1].describe() #Décrivez uniquement les données sur les survivants
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
342.000000 |
342.0 |
342.000000 |
290.000000 |
342.000000 |
342.000000 |
342.000000 |
mean |
444.368421 |
1.0 |
1.950292 |
28.343690 |
0.473684 |
0.464912 |
48.395408 |
std |
252.358840 |
0.0 |
0.863321 |
14.950952 |
0.708688 |
0.771712 |
66.596998 |
min |
2.000000 |
1.0 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
250.750000 |
1.0 |
1.000000 |
19.000000 |
0.000000 |
0.000000 |
12.475000 |
50% |
439.500000 |
1.0 |
2.000000 |
28.000000 |
0.000000 |
0.000000 |
26.000000 |
75% |
651.500000 |
1.0 |
3.000000 |
36.000000 |
1.000000 |
1.000000 |
57.000000 |
max |
890.000000 |
1.0 |
3.000000 |
80.000000 |
4.000000 |
5.000000 |
512.329200 |
df.describe() #données brutes
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
891.000000 |
891.000000 |
891.000000 |
714.000000 |
891.000000 |
891.000000 |
891.000000 |
mean |
446.000000 |
0.383838 |
2.308642 |
29.699118 |
0.523008 |
0.381594 |
32.204208 |
std |
257.353842 |
0.486592 |
0.836071 |
14.526497 |
1.102743 |
0.806057 |
49.693429 |
min |
1.000000 |
0.000000 |
1.000000 |
0.420000 |
0.000000 |
0.000000 |
0.000000 |
25% |
223.500000 |
0.000000 |
2.000000 |
20.125000 |
0.000000 |
0.000000 |
7.910400 |
50% |
446.000000 |
0.000000 |
3.000000 |
28.000000 |
0.000000 |
0.000000 |
14.454200 |
75% |
668.500000 |
1.000000 |
3.000000 |
38.000000 |
1.000000 |
0.000000 |
31.000000 |
max |
891.000000 |
1.000000 |
3.000000 |
80.000000 |
8.000000 |
6.000000 |
512.329200 |
df[df['Age'] >= 60].describe() #'Age'>=60 seulement
|
PassengerId |
Survived |
Pclass |
Age |
SibSp |
Parch |
Fare |
count |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
26.000000 |
mean |
455.807692 |
0.269231 |
1.538462 |
65.096154 |
0.230769 |
0.307692 |
43.467950 |
std |
240.078490 |
0.452344 |
0.811456 |
5.110811 |
0.429669 |
0.837579 |
51.269998 |
min |
34.000000 |
0.000000 |
1.000000 |
60.000000 |
0.000000 |
0.000000 |
6.237500 |
25% |
277.250000 |
0.000000 |
1.000000 |
61.250000 |
0.000000 |
0.000000 |
10.500000 |
50% |
489.000000 |
0.000000 |
1.000000 |
63.500000 |
0.000000 |
0.000000 |
28.275000 |
75% |
629.750000 |
0.750000 |
2.000000 |
69.000000 |
0.000000 |
0.000000 |
58.860450 |
max |
852.000000 |
1.000000 |
3.000000 |
80.000000 |
1.000000 |
4.000000 |
263.000000 |
df[(df['Age']>=60) & (df['Sex']=='female')] #Données pour les femmes de plus de 60 ans uniquement
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
275 |
276 |
1 |
1 |
Andrews, Miss. Kornelia Theodosia |
female |
63.0 |
1 |
0 |
13502 |
77.9583 |
D7 |
S |
366 |
367 |
1 |
1 |
Warren, Mrs. Frank Manley (Anna Sophia Atkinson) |
female |
60.0 |
1 |
0 |
110813 |
75.2500 |
D37 |
C |
483 |
484 |
1 |
3 |
Turkula, Mrs. (Hedwig) |
female |
63.0 |
0 |
0 |
4134 |
9.5875 |
NaN |
S |
829 |
830 |
1 |
1 |
Stone, Mrs. George Nelson (Martha Evelyn) |
female |
62.0 |
0 |
0 |
113572 |
80.0000 |
B28 |
NaN |
df[(df['Pclass']==1) | (df['Age']<10)] #Données pour 1re classe ou moins de 10 ans uniquement
df.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
8 |
9 |
1 |
3 |
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) |
female |
27.0 |
0 |
2 |
347742 |
11.1333 |
NaN |
S |
9 |
10 |
1 |
2 |
Nasser, Mrs. Nicholas (Adele Achem) |
female |
14.0 |
1 |
0 |
237736 |
30.0708 |
NaN |
C |
Si ~ (squiggle) est ajouté, il peut être filtré par l'opération NOT.
data =[{'Name':'John', 'Survived':True},
{'Name':'Emily', 'Survived':False},
{'Name':'Ben', 'Survived':True}]
df = pd.DataFrame(data)
df
|
Name |
Survived |
0 |
John |
True |
1 |
Emily |
False |
2 |
Ben |
True |
Il est souvent utilisé lors du filtrage par une colonne dont la valeur est booléenne.
df[df['Survived']==True]
|
Name |
Survived |
0 |
John |
True |
2 |
Ben |
True |
Étant donné que la colonne Survived est déjà booléenne, == True n'est pas nécessaire. Puisque df ['Survived'] est déjà une série booléenne, vous pouvez la filtrer comme indiqué à gauche.
df[df['Survived']]
|
Name |
Survived |
0 |
John |
True |
2 |
Ben |
True |
Si vous voulez vous limiter à Survived == False, vous pouvez faire ce qui suit sans avoir à faire df [df ['Survived' == False]
df[~df['Survived']]
|
Name |
Survived |
1 |
Emily |
False |
Changer d'index
Réallouer l'index avec .reset_index ()
df = pd.read_csv('train.csv')
df = df[df['Sex']=='male']
df.head() #l'index est disjoint
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |
Aligner les index
Comme avec .drop (), le df d'origine n'est pas écrasé, donc si vous voulez mettre à jour df, réaffectez-le avec inplace = True ou df = df.reset_index ().
df.reset_index() .head()
|
index |
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
2 |
5 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
3 |
6 |
7 |
0 |
1 |
McCarthy, Mr. Timothy J |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
4 |
7 |
8 |
0 |
3 |
Palsson, Master. Gosta Leonard |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |
Utilisez .set_index () pour indexer une colonne spécifique
Définir l'index sur «Nom»
Comme avec .reset_index (), vous pouvez écraser le df original avec inplace = True.
df.set_index('Name').head()
|
PassengerId |
Survived |
Pclass |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
Name |
|
|
|
|
|
|
|
|
|
|
|
Braund, Mr. Owen Harris |
1 |
0 |
3 |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
Allen, Mr. William Henry |
5 |
0 |
3 |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
Moran, Mr. James |
6 |
0 |
3 |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
McCarthy, Mr. Timothy J |
7 |
0 |
1 |
male |
54.0 |
0 |
0 |
17463 |
51.8625 |
E46 |
S |
Palsson, Master. Gosta Leonard |
8 |
0 |
3 |
male |
2.0 |
3 |
1 |
349909 |
21.0750 |
NaN |
S |