Cet article est un article dans lequel j'ai en fait codé le fonctionnement de base de Pandas décrit dans le blog de Kame (@usdatascientist) (https://datawokagaku.com/python_for_ds_summary/) en utilisant Jupyter Lab.
Résumé des opérations de base de Pandas
10e
import pandas as pd
import numpy as np
Series
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
name    John
sex     male
age       22
dtype: object
array = np.array([10,20,30])
pd.Series(array)
0    10
1    20
2    30
dtype: int64
array = np.array([10,20,30])
labels = ['a','b','c']
pd.Series(array, labels)
a    10
b    20
c    30
dtype: int64
11ème
Comment créer un DataFrame
Fabriquer à partir de ndarray
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
print(john_s['age'])
name    John
sex     male
age       22
dtype: object
22
ndarray = np.random.randint(5, size=(5,4))
pd.DataFrame(data=ndarray)
  
    
       | 
      0 | 
      1 | 
      2 | 
      3 | 
    
  
  
    
      | 0 | 
      1 | 
      1 | 
      1 | 
      0 | 
    
    
      | 1 | 
      4 | 
      1 | 
      0 | 
      0 | 
    
    
      | 2 | 
      3 | 
      2 | 
      1 | 
      0 | 
    
    
      | 3 | 
      3 | 
      1 | 
      1 | 
      3 | 
    
    
      | 4 | 
      4 | 
      0 | 
      1 | 
      3 | 
    
  
 
columns = ['a','b','c','d']
index = np.arange(0,50,10)
pd.DataFrame(data=ndarray, index=index, columns=columns)
  
    
       | 
      a | 
      b | 
      c | 
      d | 
    
  
  
    
      | 0 | 
      1 | 
      1 | 
      1 | 
      0 | 
    
    
      | 10 | 
      4 | 
      1 | 
      0 | 
      0 | 
    
    
      | 20 | 
      3 | 
      2 | 
      1 | 
      0 | 
    
    
      | 30 | 
      3 | 
      1 | 
      1 | 
      3 | 
    
    
      | 40 | 
      4 | 
      0 | 
      1 | 
      3 | 
    
  
 
Faire à partir du dictionnaire
data1 = {
    'name':'John',
    'sex':'male',
    'age':22
}
data2 = {
    'name':'Zack',
    'sex':'male',
    'age':30
}
data3 ={
    'name':'Emily',
    'sex':'female',
    'age':32
}
pd.DataFrame([data1, data2, data3])
  
    
       | 
      name | 
      sex | 
      age | 
    
  
  
    
      | 0 | 
      John | 
      male | 
      22 | 
    
    
      | 1 | 
      Zack | 
      male | 
      30 | 
    
    
      | 2 | 
      Emily | 
      female | 
      32 | 
    
  
 
df = pd.read_csv('train.csv')
df.head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      1 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      22.0 | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
  
 
12ème
Afficher les 5 premières lignes avec .head ()
df.head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      1 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      22.0 | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
  
 
Vérifier les statistiques avec .describe ()
df.describe()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Age | 
      SibSp | 
      Parch | 
      Fare | 
    
  
  
    
      | count | 
      891.000000 | 
      891.000000 | 
      891.000000 | 
      714.000000 | 
      891.000000 | 
      891.000000 | 
      891.000000 | 
    
    
      | mean | 
      446.000000 | 
      0.383838 | 
      2.308642 | 
      29.699118 | 
      0.523008 | 
      0.381594 | 
      32.204208 | 
    
    
      | std | 
      257.353842 | 
      0.486592 | 
      0.836071 | 
      14.526497 | 
      1.102743 | 
      0.806057 | 
      49.693429 | 
    
    
      | min | 
      1.000000 | 
      0.000000 | 
      1.000000 | 
      0.420000 | 
      0.000000 | 
      0.000000 | 
      0.000000 | 
    
    
      | 25% | 
      223.500000 | 
      0.000000 | 
      2.000000 | 
      20.125000 | 
      0.000000 | 
      0.000000 | 
      7.910400 | 
    
    
      | 50% | 
      446.000000 | 
      0.000000 | 
      3.000000 | 
      28.000000 | 
      0.000000 | 
      0.000000 | 
      14.454200 | 
    
    
      | 75% | 
      668.500000 | 
      1.000000 | 
      3.000000 | 
      38.000000 | 
      1.000000 | 
      0.000000 | 
      31.000000 | 
    
    
      | max | 
      891.000000 | 
      1.000000 | 
      3.000000 | 
      80.000000 | 
      8.000000 | 
      6.000000 | 
      512.329200 | 
    
  
 
type(df.describe()) #le type est DataFrame
pandas.core.frame.DataFrame
Afficher la liste des colonnes en .columns
df.columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')
type(df.columns) #le type est index
pandas.core.indexes.base.Index
df.index #Il y a aussi un index.
RangeIndex(start=0, stop=891, step=1)
Obtenez la série avec une colonne spécifique englobée par le crochet [].
df['Age'].head()
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64
type(df['Age'])
pandas.core.series.Series
Mettez une liste de colonnes entre crochets [] et extrayez plusieurs colonnes à la fois
df[['Age','Parch','Fare']].head()
  
    
       | 
      Age | 
      Parch | 
      Fare | 
    
  
  
    
      | 0 | 
      22.0 | 
      0 | 
      7.2500 | 
    
    
      | 1 | 
      38.0 | 
      0 | 
      71.2833 | 
    
    
      | 2 | 
      26.0 | 
      0 | 
      7.9250 | 
    
    
      | 3 | 
      35.0 | 
      0 | 
      53.1000 | 
    
    
      | 4 | 
      35.0 | 
      0 | 
      8.0500 | 
    
  
 
Obtenez une ligne spécifique dans Series avec .iloc [int]
df.iloc[888] #index location
PassengerId                                         889
Survived                                              0
Pclass                                                3
Name           Johnston, Miss. Catherine Helen "Carrie"
Sex                                              female
Age                                                 NaN
SibSp                                                 1
Parch                                                 2
Ticket                                       W./C. 6607
Fare                                              23.45
Cabin                                               NaN
Embarked                                              S
Name: 888, dtype: object
df.iloc[888]['Age']
nan
np.isnan(df.iloc[888]['Age'])
True
np.random.seed(1)
ndarray = np.random.randint(10, size=(5,5))
columns = [0,1,2,3,4]
index = ['a','b','c','d','e']
df_1 = pd.DataFrame(data=ndarray, index=index, columns=columns)
df_1
  
    
       | 
      0 | 
      1 | 
      2 | 
      3 | 
      4 | 
    
  
  
    
      | a | 
      5 | 
      8 | 
      9 | 
      5 | 
      0 | 
    
    
      | b | 
      0 | 
      1 | 
      7 | 
      6 | 
      9 | 
    
    
      | c | 
      2 | 
      4 | 
      5 | 
      2 | 
      4 | 
    
    
      | d | 
      2 | 
      4 | 
      7 | 
      7 | 
      9 | 
    
    
      | e | 
      1 | 
      7 | 
      0 | 
      6 | 
      9 | 
    
  
 
df_1[0] 
a    5
b    0
c    2
d    2
e    1
Name: 0, dtype: int64
df_1.loc['c'] #Lorsque la ligne n'est pas int['str']À.
0    2
1    4
2    5
3    2
4    4
Name: c, dtype: int64
Supprimer certaines lignes et colonnes avec le découpage
Drop index = 0 (0ème colonne)
df.drop(0) .head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
    
      | 5 | 
      6 | 
      0 | 
      3 | 
      Moran, Mr. James | 
      male | 
      NaN | 
      0 | 
      0 | 
      330877 | 
      8.4583 | 
      NaN | 
      Q | 
    
  
 
Abandonnez la colonne «Âge»
df.drop('Age', axis=1) .head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      1 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
  
 
Lors de la suppression de plusieurs colonnes, transmettez la liste comme argument .drop ([]). Drop ne modifie pas le df d'origine
df.drop(['Age','PassengerId'], axis=1) .head()
  
    
       | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 1 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 4 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
  
 
df.head()#Drop ne modifie pas le df d'origine
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      1 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      22.0 | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
  
 
Il existe deux façons d'écraser df. Le paramètre place = True modifiera le DataFrame d'origine
df = pd.read_csv('train.csv')
df.drop(['Age', 'Cabin'], axis=1, inplace=True) 
df .head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      1 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      22.0 | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
  
 
df = pd.read_csv('train.csv')
df = df.drop(['Age', 'Cabin'], axis=1)
id(df)
140285150057616
Obtenez plusieurs lignes avec le tranchage
df.iloc[5:10]
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Embarked | 
    
  
  
    
      | 5 | 
      6 | 
      0 | 
      3 | 
      Moran, Mr. James | 
      male | 
      0 | 
      0 | 
      330877 | 
      8.4583 | 
      Q | 
    
    
      | 6 | 
      7 | 
      0 | 
      1 | 
      McCarthy, Mr. Timothy J | 
      male | 
      0 | 
      0 | 
      17463 | 
      51.8625 | 
      S | 
    
    
      | 7 | 
      8 | 
      0 | 
      3 | 
      Palsson, Master. Gosta Leonard | 
      male | 
      3 | 
      1 | 
      349909 | 
      21.0750 | 
      S | 
    
    
      | 8 | 
      9 | 
      1 | 
      3 | 
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | 
      female | 
      0 | 
      2 | 
      347742 | 
      11.1333 | 
      S | 
    
    
      | 9 | 
      10 | 
      1 | 
      2 | 
      Nasser, Mrs. Nicholas (Adele Achem) | 
      female | 
      1 | 
      0 | 
      237736 | 
      30.0708 | 
      C | 
    
  
 
13e
Filtrer le DataFrame selon des conditions spécifiques
df = pd.read_csv('train.csv')
df = df['Survived'] == 1#Filtrer les survivants
df.head()
0    False
1     True
2     True
3     True
4    False
Name: Survived, dtype: bool
filter = df['Survived'] ==1 #Mettez-le dans une variable appelée filtre
df = df[filter]
df.head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 8 | 
      9 | 
      1 | 
      3 | 
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | 
      female | 
      27.0 | 
      0 | 
      2 | 
      347742 | 
      11.1333 | 
      NaN | 
      S | 
    
    
      | 9 | 
      10 | 
      1 | 
      2 | 
      Nasser, Mrs. Nicholas (Adele Achem) | 
      female | 
      14.0 | 
      1 | 
      0 | 
      237736 | 
      30.0708 | 
      NaN | 
      C | 
    
  
 
df = df[df['Survived'] ==1] #C'est plus courant
df.head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 8 | 
      9 | 
      1 | 
      3 | 
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | 
      female | 
      27.0 | 
      0 | 
      2 | 
      347742 | 
      11.1333 | 
      NaN | 
      S | 
    
    
      | 9 | 
      10 | 
      1 | 
      2 | 
      Nasser, Mrs. Nicholas (Adele Achem) | 
      female | 
      14.0 | 
      1 | 
      0 | 
      237736 | 
      30.0708 | 
      NaN | 
      C | 
    
  
 
df[df['Survived'] ==1].describe() #Décrivez uniquement les données sur les survivants
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Age | 
      SibSp | 
      Parch | 
      Fare | 
    
  
  
    
      | count | 
      342.000000 | 
      342.0 | 
      342.000000 | 
      290.000000 | 
      342.000000 | 
      342.000000 | 
      342.000000 | 
    
    
      | mean | 
      444.368421 | 
      1.0 | 
      1.950292 | 
      28.343690 | 
      0.473684 | 
      0.464912 | 
      48.395408 | 
    
    
      | std | 
      252.358840 | 
      0.0 | 
      0.863321 | 
      14.950952 | 
      0.708688 | 
      0.771712 | 
      66.596998 | 
    
    
      | min | 
      2.000000 | 
      1.0 | 
      1.000000 | 
      0.420000 | 
      0.000000 | 
      0.000000 | 
      0.000000 | 
    
    
      | 25% | 
      250.750000 | 
      1.0 | 
      1.000000 | 
      19.000000 | 
      0.000000 | 
      0.000000 | 
      12.475000 | 
    
    
      | 50% | 
      439.500000 | 
      1.0 | 
      2.000000 | 
      28.000000 | 
      0.000000 | 
      0.000000 | 
      26.000000 | 
    
    
      | 75% | 
      651.500000 | 
      1.0 | 
      3.000000 | 
      36.000000 | 
      1.000000 | 
      1.000000 | 
      57.000000 | 
    
    
      | max | 
      890.000000 | 
      1.0 | 
      3.000000 | 
      80.000000 | 
      4.000000 | 
      5.000000 | 
      512.329200 | 
    
  
 
df.describe() #données brutes
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Age | 
      SibSp | 
      Parch | 
      Fare | 
    
  
  
    
      | count | 
      891.000000 | 
      891.000000 | 
      891.000000 | 
      714.000000 | 
      891.000000 | 
      891.000000 | 
      891.000000 | 
    
    
      | mean | 
      446.000000 | 
      0.383838 | 
      2.308642 | 
      29.699118 | 
      0.523008 | 
      0.381594 | 
      32.204208 | 
    
    
      | std | 
      257.353842 | 
      0.486592 | 
      0.836071 | 
      14.526497 | 
      1.102743 | 
      0.806057 | 
      49.693429 | 
    
    
      | min | 
      1.000000 | 
      0.000000 | 
      1.000000 | 
      0.420000 | 
      0.000000 | 
      0.000000 | 
      0.000000 | 
    
    
      | 25% | 
      223.500000 | 
      0.000000 | 
      2.000000 | 
      20.125000 | 
      0.000000 | 
      0.000000 | 
      7.910400 | 
    
    
      | 50% | 
      446.000000 | 
      0.000000 | 
      3.000000 | 
      28.000000 | 
      0.000000 | 
      0.000000 | 
      14.454200 | 
    
    
      | 75% | 
      668.500000 | 
      1.000000 | 
      3.000000 | 
      38.000000 | 
      1.000000 | 
      0.000000 | 
      31.000000 | 
    
    
      | max | 
      891.000000 | 
      1.000000 | 
      3.000000 | 
      80.000000 | 
      8.000000 | 
      6.000000 | 
      512.329200 | 
    
  
 
df[df['Age'] >= 60].describe() #'Age'>=60 seulement
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Age | 
      SibSp | 
      Parch | 
      Fare | 
    
  
  
    
      | count | 
      26.000000 | 
      26.000000 | 
      26.000000 | 
      26.000000 | 
      26.000000 | 
      26.000000 | 
      26.000000 | 
    
    
      | mean | 
      455.807692 | 
      0.269231 | 
      1.538462 | 
      65.096154 | 
      0.230769 | 
      0.307692 | 
      43.467950 | 
    
    
      | std | 
      240.078490 | 
      0.452344 | 
      0.811456 | 
      5.110811 | 
      0.429669 | 
      0.837579 | 
      51.269998 | 
    
    
      | min | 
      34.000000 | 
      0.000000 | 
      1.000000 | 
      60.000000 | 
      0.000000 | 
      0.000000 | 
      6.237500 | 
    
    
      | 25% | 
      277.250000 | 
      0.000000 | 
      1.000000 | 
      61.250000 | 
      0.000000 | 
      0.000000 | 
      10.500000 | 
    
    
      | 50% | 
      489.000000 | 
      0.000000 | 
      1.000000 | 
      63.500000 | 
      0.000000 | 
      0.000000 | 
      28.275000 | 
    
    
      | 75% | 
      629.750000 | 
      0.750000 | 
      2.000000 | 
      69.000000 | 
      0.000000 | 
      0.000000 | 
      58.860450 | 
    
    
      | max | 
      852.000000 | 
      1.000000 | 
      3.000000 | 
      80.000000 | 
      1.000000 | 
      4.000000 | 
      263.000000 | 
    
  
 
df[(df['Age']>=60) & (df['Sex']=='female')] #Données pour les femmes de plus de 60 ans uniquement
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 275 | 
      276 | 
      1 | 
      1 | 
      Andrews, Miss. Kornelia Theodosia | 
      female | 
      63.0 | 
      1 | 
      0 | 
      13502 | 
      77.9583 | 
      D7 | 
      S | 
    
    
      | 366 | 
      367 | 
      1 | 
      1 | 
      Warren, Mrs. Frank Manley (Anna Sophia Atkinson) | 
      female | 
      60.0 | 
      1 | 
      0 | 
      110813 | 
      75.2500 | 
      D37 | 
      C | 
    
    
      | 483 | 
      484 | 
      1 | 
      3 | 
      Turkula, Mrs. (Hedwig) | 
      female | 
      63.0 | 
      0 | 
      0 | 
      4134 | 
      9.5875 | 
      NaN | 
      S | 
    
    
      | 829 | 
      830 | 
      1 | 
      1 | 
      Stone, Mrs. George Nelson (Martha Evelyn) | 
      female | 
      62.0 | 
      0 | 
      0 | 
      113572 | 
      80.0000 | 
      B28 | 
      NaN | 
    
  
 
df[(df['Pclass']==1) | (df['Age']<10)] #Données pour 1re classe ou moins de 10 ans uniquement
df.head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 1 | 
      2 | 
      1 | 
      1 | 
      Cumings, Mrs. John Bradley (Florence Briggs Th... | 
      female | 
      38.0 | 
      1 | 
      0 | 
      PC 17599 | 
      71.2833 | 
      C85 | 
      C | 
    
    
      | 2 | 
      3 | 
      1 | 
      3 | 
      Heikkinen, Miss. Laina | 
      female | 
      26.0 | 
      0 | 
      0 | 
      STON/O2. 3101282 | 
      7.9250 | 
      NaN | 
      S | 
    
    
      | 3 | 
      4 | 
      1 | 
      1 | 
      Futrelle, Mrs. Jacques Heath (Lily May Peel) | 
      female | 
      35.0 | 
      1 | 
      0 | 
      113803 | 
      53.1000 | 
      C123 | 
      S | 
    
    
      | 8 | 
      9 | 
      1 | 
      3 | 
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | 
      female | 
      27.0 | 
      0 | 
      2 | 
      347742 | 
      11.1333 | 
      NaN | 
      S | 
    
    
      | 9 | 
      10 | 
      1 | 
      2 | 
      Nasser, Mrs. Nicholas (Adele Achem) | 
      female | 
      14.0 | 
      1 | 
      0 | 
      237736 | 
      30.0708 | 
      NaN | 
      C | 
    
  
 
Si ~ (squiggle) est ajouté, il peut être filtré par l'opération NOT.
data =[{'Name':'John', 'Survived':True},
      {'Name':'Emily', 'Survived':False},
      {'Name':'Ben', 'Survived':True}]
df = pd.DataFrame(data)
df
  
    
       | 
      Name | 
      Survived | 
    
  
  
    
      | 0 | 
      John | 
      True | 
    
    
      | 1 | 
      Emily | 
      False | 
    
    
      | 2 | 
      Ben | 
      True | 
    
  
 
Il est souvent utilisé lors du filtrage par une colonne dont la valeur est booléenne.
df[df['Survived']==True] 
  
    
       | 
      Name | 
      Survived | 
    
  
  
    
      | 0 | 
      John | 
      True | 
    
    
      | 2 | 
      Ben | 
      True | 
    
  
 
Étant donné que la colonne Survived est déjà booléenne, == True n'est pas nécessaire. Puisque df ['Survived'] est déjà une série booléenne, vous pouvez la filtrer comme indiqué à gauche.
df[df['Survived']] 
  
    
       | 
      Name | 
      Survived | 
    
  
  
    
      | 0 | 
      John | 
      True | 
    
    
      | 2 | 
      Ben | 
      True | 
    
  
 
Si vous voulez vous limiter à Survived == False, vous pouvez faire ce qui suit sans avoir à faire df [df ['Survived' == False]
df[~df['Survived']] 
  
    
       | 
      Name | 
      Survived | 
    
  
  
    
      | 1 | 
      Emily | 
      False | 
    
  
 
Changer d'index
Réallouer l'index avec .reset_index ()
df = pd.read_csv('train.csv')
df = df[df['Sex']=='male']
df.head() #l'index est disjoint
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      1 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      22.0 | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
    
      | 5 | 
      6 | 
      0 | 
      3 | 
      Moran, Mr. James | 
      male | 
      NaN | 
      0 | 
      0 | 
      330877 | 
      8.4583 | 
      NaN | 
      Q | 
    
    
      | 6 | 
      7 | 
      0 | 
      1 | 
      McCarthy, Mr. Timothy J | 
      male | 
      54.0 | 
      0 | 
      0 | 
      17463 | 
      51.8625 | 
      E46 | 
      S | 
    
    
      | 7 | 
      8 | 
      0 | 
      3 | 
      Palsson, Master. Gosta Leonard | 
      male | 
      2.0 | 
      3 | 
      1 | 
      349909 | 
      21.0750 | 
      NaN | 
      S | 
    
  
 
Aligner les index
Comme avec .drop (), le df d'origine n'est pas écrasé, donc si vous voulez mettre à jour df, réaffectez-le avec inplace = True ou df = df.reset_index ().
df.reset_index() .head()
  
    
       | 
      index | 
      PassengerId | 
      Survived | 
      Pclass | 
      Name | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
  
  
    
      | 0 | 
      0 | 
      1 | 
      0 | 
      3 | 
      Braund, Mr. Owen Harris | 
      male | 
      22.0 | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | 1 | 
      4 | 
      5 | 
      0 | 
      3 | 
      Allen, Mr. William Henry | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
    
      | 2 | 
      5 | 
      6 | 
      0 | 
      3 | 
      Moran, Mr. James | 
      male | 
      NaN | 
      0 | 
      0 | 
      330877 | 
      8.4583 | 
      NaN | 
      Q | 
    
    
      | 3 | 
      6 | 
      7 | 
      0 | 
      1 | 
      McCarthy, Mr. Timothy J | 
      male | 
      54.0 | 
      0 | 
      0 | 
      17463 | 
      51.8625 | 
      E46 | 
      S | 
    
    
      | 4 | 
      7 | 
      8 | 
      0 | 
      3 | 
      Palsson, Master. Gosta Leonard | 
      male | 
      2.0 | 
      3 | 
      1 | 
      349909 | 
      21.0750 | 
      NaN | 
      S | 
    
  
 
Utilisez .set_index () pour indexer une colonne spécifique
Définir l'index sur «Nom»
Comme avec .reset_index (), vous pouvez écraser le df original avec inplace = True.
df.set_index('Name').head()
  
    
       | 
      PassengerId | 
      Survived | 
      Pclass | 
      Sex | 
      Age | 
      SibSp | 
      Parch | 
      Ticket | 
      Fare | 
      Cabin | 
      Embarked | 
    
    
      | Name | 
       | 
       | 
       | 
       | 
       | 
       | 
       | 
       | 
       | 
       | 
       | 
    
  
  
    
      | Braund, Mr. Owen Harris | 
      1 | 
      0 | 
      3 | 
      male | 
      22.0 | 
      1 | 
      0 | 
      A/5 21171 | 
      7.2500 | 
      NaN | 
      S | 
    
    
      | Allen, Mr. William Henry | 
      5 | 
      0 | 
      3 | 
      male | 
      35.0 | 
      0 | 
      0 | 
      373450 | 
      8.0500 | 
      NaN | 
      S | 
    
    
      | Moran, Mr. James | 
      6 | 
      0 | 
      3 | 
      male | 
      NaN | 
      0 | 
      0 | 
      330877 | 
      8.4583 | 
      NaN | 
      Q | 
    
    
      | McCarthy, Mr. Timothy J | 
      7 | 
      0 | 
      1 | 
      male | 
      54.0 | 
      0 | 
      0 | 
      17463 | 
      51.8625 | 
      E46 | 
      S | 
    
    
      | Palsson, Master. Gosta Leonard | 
      8 | 
      0 | 
      3 | 
      male | 
      2.0 | 
      3 | 
      1 | 
      349909 | 
      21.0750 | 
      NaN | 
      S |