[PYTHON] This and that about pd.DataFrame

Introduction

A memo about pandas.DataFrame (pd.DataFrame).

DataFrame initialization

#Empty DataFrame
df = pd.DataFrame(columns=[List of column names])

#Get from csv file
df = pd.read_csv([File path])
df = pd.read_csv([File path], names=[List of column names])  #Without header
df = pd.read_csv([File path], sep=',')  #When specifying the delimiter
df = pd.read_csv([File path], delim_whitespace=True)  #When separated by spaces
df = pd.read_csv([File path], comment='#')  #When including comment text

Reference: Read csv / tsv file with pandas (read_csv, read_table)

Add dictionary to DataFrame

df = df.append([dictionary], ignore_index=True)

Note that unlike the append in the list, the `df.append ()` alone does not update the DataFrame.

Extract elements from DataFrame

#     'a' 'b'
# 0 |  1   2
# 1 |  3   4

#Get element
df.loc[0,'a'] # -> 1

#Get row
dist(df.loc[0,:]) # -> {'a':1, 'b':2}

#Get column
list(df.loc[:,'a']) # -> [1, 3]

Reference: Get / change the value of any position with pandas at, iat, loc, iloc

Extract rows that meet the conditions from DataFrame

#Simple conditions
df = df[df['num']>0]
df = df[df['str']=='Yes']
df = df[df['str'].isin(['Yes', 'No'])]  #When there are multiple candidates

#String conditions (if it contains missing values NaN)'na=False'To the options)
df = df[df['str'].str.startswith('Y')]  #First string
df = df[df['str'].str.contains('e')]  #Character string contained in
df = df[df['str'].str.endswith('s')]  #String at the end

#Multiple conditions
df = df[(df['num']>0) & (df['str']=='Yes')]  #Instead of and&
df = df[(df['num']>0) | (df['str']=='Yes')]  #Instead of or|

Reference: query to extract rows of pandas.DataFrame by condition

Other

#Sort according to the specified column
df = df.sort_values('a', ascending=True)

#Reindex
df = df.reset_index(drop=True)

#Save DataFrame to csv file
df.to_csv([File path], index=False)

Recommended Posts

This and that about pd.DataFrame
matplotlib this and that
This and that using reflect
Zabbix API this and that
About _ and __
This and that learned from boost.python
This and that of python properties
This and that using NLTK (memo)
This and that of the inclusion notation.
This and that useful when used with nohup
About Class and Instance
About cumprod and cummax
About cross-validation and F-number
This and that around Mysql in Apache environment (Note)
Linux (about files and directories)
About python objects and classes
About Python variables and objects
About LINUX files and processes
About Raid group and LUN
About fork () function and execve () function
About Django's deconstruct and deconstructible
About Python, len () and randint ()
About Python datetime and timezone
About Sharpe Ratio and Sortino Ratio
About Python and regular expressions
About Python and os operations
About http.Handle () and http.NewServeMux (). Handle ()
This and that for using Step Functions with CDK + Python
Python # About reference and copy
About Numpy array and asarray
About Python sort () and reverse ()
[Notes / Updated from time to time] This and that of Azure Functions
About Boxplot and Violinplot that visualize the variability of independent data
About the bug that anaconda fails to import numpy and scipy
About installing Pwntools and Python2 series
Summary and common errors about cron
About python dict and sorted functions
About dtypes in Python and Cython
About MkDocs themes and their customs
About Python pickle (cPickle) and marshal
[Python] About Executor and Future classes
About Python, from and import, as
About time series data and overfitting
About "spleeter" that can separate vocals and musical instruments from music data