[PYTHON] Pandas basics

Pandas basic summary

About Series and DataFrame

Series

What is Series? A list of one-dimensional values

series_spreadsheet.png

When a dict type object is put in Series, key is expressed as index.

data = {
    "Name":"Jhon",
    "Sex":"male",
    "AGe":22
}
pd.Series(data)
>
Name    Jhon
Sex     male
AGe       22
dtype: object

Create Series from Numpy array

array = np.array([22,31,42,23])
age_series = pd.Series(array)
age_series

Specify index in array and call by index

array = np.array(['John','male',22])
john_series = pd.Series(array,index = ['Name','Sex','Age'])
john_seiies["Name"]
>John

john_seiries
>
Name    John
Sex     male
Age       22
dtype: object

Get the original Numpy array

age_series.values.values 
>array([22, 31, 42, 23])

DataFrame

As an image, the matrix itself is treated as a table (row Series, column Series), and the combination is like a DataFrame.

unnamed.png

In the figure above, only the column Series, Also handles Series in rows

Created from Numpy array

ndarray = np.arange(10).reshape(2,5)
ndarray
>
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

pd.DataFrame(ndarray,index = ["index1",'index2'] ,columns = ['a','b','c','d','e' ])
>
         | a | b | c | d | e |
| index1 | 0 | 1 | 2 | 3 | 4 |
| index2 | 5 | 6 | 7 | 8 | 9 |

Basic flow 1 Read with read_csv 2 Analyze basic data information

df = pd.read_csv("dataset/tmdb_5000_movies.csv")
# len()Check the number of data with
len(df) 

When you want to display the list without omitting it

#Remove colomu restrictions
pd.set_option('display.max_columns',None)

#Eliminate the restrictions on rows (each data) (* Note that it will be heavy)
pd.set_option('display.max_rows',None)
df.describe()
type(df)  #describe itself can be treated as a DataFrame

DataFrame operations

Returned in Series

df["Column name"]○ Recommended
df.Column name ▲ Not recommended

Returned by DataFrame

df[["revenue"]]

# Colum can be selected multiple times
df[["revenue","original_title","budget"]]
#Specify the index of a specific row and retrieve it
df.iloc[10:13]

#Specify the index of a specific row and retrieve the specified column
df.iloc[10:13]["original_title"]

Delete row / column

drop() #The original dataframe remains unchanged

Change the original DataFrame with inplace = True


<Delete specific lines at once axis=0 (* Specified by default)>
df.drop('id', (axis = 0) ,(inplace = True))  

<Delete the specified column axis= 1>
df.drop('id', axis = 1,(inplace = True))  

df = df.drop(5) #A method to update the original data, which is more major than inplace! Reuse the same variables

dropna()Delete all missing values

np.isnan()Determine if there is nan (missing value)

fillna()Fill in missing values
>fillna(df["runtime"].mean())

Filter

How to filter
#Example) I want to specify only Japanese movies
j_movie = df[df['original_language'] == 'ja'] #This way of writing is basically often used


()&()Or()|()Enter multiple conditions with
#Example) I want to specify only Japanese movies with a rating of 8 or higher.
j_movie = df[(df['original_language'] == 'ja') & (df["vote_average"] >= 8 ) ] 

df[ (df['budget'] == 0 ) | (df['revenue'] == 0 ) ]
→ Filter: "Budget or sales are 0"
 

df[ ~ ((df['budget'] == 0 ) | (df['revenue'] == 0 )) ]
Filter: "Budget or sales is not 0" (NOT operation ~)

how to use merge ()

Argument how options

df1 = pd.DataFrame({'key':["k0","k1","k2"],
                  'A':["a0","a1","a2"],
                  'B':["b0","b1","b2"]})

df2 = pd.DataFrame({'key':["k0","k1","k2"],
                  'C':["c0","c1","c2"],
                  'D':["d0","d1","d2"]})

join-type.jpg

20150125230158.png

Recommended Posts

Pandas basics
Pandas basics
Pandas
Pandas memo
Pandas basics for beginners ① Reading & processing
Linux basics
Python basics
NumPy basics
Python basics ④
Git basics
Pandas notes
Python basics ③
Python basics
Django basics
Pandas memorandum
Linux basics
Python basics
Python basics
Python basics ③
pandas memorandum
pandas memo
Python basics ②
Python basics ②
Pandas basics summary link for beginners
pandas SettingWithCopyWarning
pandas self-study notes
Pandas basics for beginners ③ Histogram creation with matplotlib
Python basics: list
Python basics memorandum
Shell script basics # 2
My pandas (python)
Excel-> pandas-> sqlite
#Python basics (#matplotlib)
Python CGI basics
Python basics: dictionary
[pandas] GroupBy Tips
Read pandas data
About pandas describe
pandas related links
Missing value pandas
9rep --Pandas MySQL
[Pandas] Basics of processing date data using dt
Basics of Python ①
Basics of python ①
Python slice basics
#Python basics (scope)
Go class basics
#Python basics (#Numpy 1/2)
pandas 1.2.0 What's new
#Python basics (#Numpy 2/2)
Unsupervised learning 1 Basics
#Python basics (functions)
Pandas operation memorandum
Python array basics
Sort by pandas
Python profiling basics
Linux command basics
Python #Numpy basics
Python basics: functions
Basics of pandas for beginners ② Understanding data overview
python pandas notes