My pandas (python)

Summary for myself Updated from time to time Describe the command you used and examined ** I just need to know myself, so the term may be wrong in some places **

!! View

#command
    #Argument option description

Module loading

import pandas 

Pandas data objects and frequent operations

Series

#One-dimensional data object
#Array in my image
ser = pandas.Series()

DataFrame

#Two-dimensional data object
#My image is similar to a DB table
df = pandas.DataFrame()
Manipulating the DataFrame structure

#x,y,...Sort in order of
df.sort_values(x,y,...)

#Removed the argument index.
df.drop(x)
    axis = 1 #Delete column

#Combine data frames
df.merge(x,y,on=z) #join x and y tables with z column as key
    suffixes=() #Suffix to be added when there are duplicate items Separated by commas, the first is the left df suffix, and the second is the right suffix.
#Swap rows and columns
df.transpose()

df.concat([x,y,z,...])
    #A list of dfs you want to combine into a list of arguments

DataFrame extraction / acquisition operation

#Maximum value
df.max()

#minimum value
df.min()

#Extract information for each item in the data frame
df.info()
    #No special arguments are required

#Extract by row number / column number
df.iloc[line,Column] #Argument:And all

#Extract by row name / column name
df.loc[line,Column] #Argument:And all

#WHERE IN in SQL
df.isin()
    #Arguments are lists, etc.

#Returns the record for the argument from the beginning
df.head()

#Get median value
df.median()

#Replace Nan
df.filna()

#Get summary statistics
df.describe() 
#Return the following statistics in DataFrame for all numeric columns
 #count:Number of elements
 #unique:Number of unique (unique) value elements
 #top:Mode
 #freq:Mode frequency (number of occurrences)
 #mean:Arithmetic mean
 #std:standard deviation
 #min:minimum value
 #max:Maximum value
 #50%:Median
 #25%: 1/Quartile
 #75%: 3/Quartile




Groupby

#groupby
group = df.groupby()
    as_index=False #If False, the reference value of aggregation will not be an index
    how = left,right,outer
    #Item name in argument

#number
group.size()

#Aggregate specific items in various ways
df.agg({'Items to be aggregated':['Aggregation method list']
DataFrame read operation
#Read csv. This is when reading data whose delimiter is a comma
df.read_csv()
  encoding: #Specify the character code
  header= #Set what row the column name is
    name= #Set column name
    dtype= #Specify data type with dictionary type
    sep= #Specifying the delimiter
    engine=
    usecols = #Specify the column to read in the list.

#Read table. This is when the delimiter reads the tab data
df.read_table()
  encoding: #Specify the character code
  header= #Set what row the column name is
    name= #Set column name

#Have the DB read
df.read_sql()
    #The first argument is SQL
    #The second argument is the connection object
DataFrame export operation
df.to_csv()
    encoding= #Character code
    index= #Output index together or default is True

DataFrame row name / column name operation
#Overwrite column name
df.columns = [list]
df.rename(columns={Current column name:New column name})

#Overwrite index
df.index = [list]

#Column name / index name change
df.rename({Current name: New name})
    axis=1 #Change column name. If not specified, it will be the line name.
    

#Reindex
df.reset_index()
    drop=True #Delete existing index


DataFrame write operation
#Insert line
df.[Column name] = x


#Replace
df.replace({Current character:New character}) #The argument is dictionary type{Character to replace:Character after replacement}

#Insert argument as new line, insert line is list, Series, numpy.array
df.append() 

#Add column
df.assign()

#Apply function to each column / row
df.apply()
    axis=1 #Line by line
    axis=0 #By column
    #Function as an argument. Lambda style is fine.
Turn for statement in DataFrame
#Extract the DataFrame line by line and apply it to for.
for index,row in df.iterrows()
  #The return value is index and other row elements
Lambda expression in DataFrame

This article is very easy to understand. ↓ Recursion Substitution Eradication Committee for Data Processing in Python / pandas

Graph from pandas
#bar graph
df.plot.bar()
Do something about Nan
#Distinguishing Nan
df.isnull()


#Remove Nan
df.dropna()
    axis=1 #Delete column.

#Replace nan
df.fillna()
 
Manage the duplication.

#Find duplicate lines
#The return value is True if it overlaps with the index, otherwise False column
df.dupulicated()
    keep = False #If you do not specify, you cannot retrieve it as a duplicate.
#Remove completely duplicate lines
df.drop_duplicates()



drawing
#Pair plot
grr = pd.scatter_matrix(df)
    #df is the data to plot
    c= #Value to scale
    figsize=(x,y) #Figure size
    marker= #Marker shape
    hist_kwds={} #Histogram settings
    s= #Marker size
    alpha= #Transparency

Recommended Posts

My pandas (python)
My Numpy (Python)
My sys (python)
My pyproj (python)
My str (python)
My pyautogui (python)
My PySide (Python)
My shutil (python)
My matplotlib (python)
My urllib (python)
My pyperclip (python)
My sklearn (python)
[My memo] python
My ConfigParser (Python)
My Webdriver (Python)
My arcpy (python)
python pandas notes
My win32gui (Python)
My os (python)
My python environment memo
[Tips] My Pandas Note
My Beautiful Soup (Python)
My pandas too late?
[My memo] python -v / python -V
Python Tips (my memo)
Installing pandas on python2.6
Python application: Pandas # 3: Dataframe
Python Basic --Pandas, Numpy-
My python data analysis container
Read csv with python pandas
Python application: Pandas Part 2: Series
[Python] Convert list to Pandas [Pandas]
Python
Python pandas strip header space
Pandas
[Python] Change dtype with pandas
My python data analytics environment
Install pandas 0.14 on python3.4 [on Mac]
python pandas study recent summary
Memorandum @ Python OR Seminar: Pandas
100 Pandas knocks for Python beginners
Data analysis using python pandas
The Power of Pandas: Python
Python hand play (Pandas / DataFrame beginning)
[Python] Loading csv files using pandas
[Python] Operation memo of pandas DataFrame
Hit treasure data from Python Pandas
[Python] How to use Pandas Series
I made my own Python library
[Python] My stock price forecast [HFT]
My favorite boto3 (Python) API sample
[Python] Join two tables with pandas
Python Pandas Data Preprocessing Personal Notes
[Introduction to Python] Let's use pandas
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
Python My Number verification module released
[Introduction to Python] Let's use pandas
[Introduction to Python] Let's use pandas
Excel aggregation with Python pandas Part 1
[Python] Format when to_csv with pandas
kafka python