[PYTHON] A collection of Numpy, Pandas Tips that are often used in the field

import numpy as np
import pandas as pd
import sys
# encoding
print(sys.getdefaultencoding())

Data acquisition

# numpy
x = np.array([[1,2,3],[4,5,6]],dtype=np.float64)
#Read text
y = np.loadtxt('text1',delimiter=',',skiprows=0,comments='#')

slice

pandas.DataFrame
df=pd.read_csv('stock.csv',encoding='Shift_jis',names=('index','dekidaka','owarine'))
df.head() #First 5 lines
pd.DataFrame({'a':[1,2,3],'b':[4,5,6]}) 
df.loc[:,['index','owaine']]
df.loc[100:115,['index','dekidaka']]
df.iloc[1:22,1:3] # 1 ~ 3 index,dekidaka,owarine
df.iloc[:,[0,2]] # 0 and 2 index,owarine
df.iloc[::2]  #Even
df.iloc[1::2]  #Odd
df['index'] < '1900' # False or True
df[(df['index'] == '1900')] # 171.0  1883,     49800     261

Data management

#merge
samp1 = pd.read_csv('sample1.csv',encoding='Shift_jis')
samp2 = pd.read_csv('sample2.csv',encoding='Shift_jis')
samp3 = pd.read_csv('sample3.csv',encoding='Shift_jis')
# concat(Vertical connection)
conc=pd.concat([samp1,samp2],ignore_index=True)
# merge(Horizontal connection)
merg=pd.merge(conc,samp3[["label1","label2"]],on="label1",how="left") # 
#Data extraction
merg["label2_y"] # only label2_y 1000 ~ 1003
merg[["label2_x","label2_y"]].iloc[:,0:2] # 
####################
#Summary statistics
merg["label1"].iloc[1:5].describe()
#Addition
merg["label1"] + merg["label2_y"] #Addition
#total
merg["label1"].sum()
#Missing value
merg.isnull()
#Missing information
merg.isnull().sum()
#Maximum value+minimum value
print(merg.max() + merg.min())
#Data type confirmation
merg.dtype
#Type conversion float64 ⇒ numeric
merg["label1"]=pd.to_numeric(merg["label1"])
# float64 ⇒ datetime 
merg["label1"].dt.strftime("%Y%m")
# grouping
merg.groupby(["label1"]).sum()["label2_y"]

Data correction

#Unique number
print(len(pd.unique(merg.label3))) #18 pieces, just having a space on the left end is considered different
#Align lowercase letters to uppercase
merg["label3"]=merg["label3"].str.upper() 
print(len(pd.unique(merg.label3))) #17
merg["label3"]=merg["label3"].str.replace(" ","") 
print(len(pd.unique(merg.label3))) #16
#sort
merg.sort_values(by=["label1"],ascending=True)

Recommended Posts

A collection of Numpy, Pandas Tips that are often used in the field
Python scikit-learn A collection of predictive model tips often used in the field
Python scikit-learn A collection of predictive model tips often used in the field
A collection of code often used in personal Python
A collection of Excel operations often used in Python
Summary of methods often used in pandas
A timer (ticker) that can be used in the field (can be used anywhere)
A personal memo of Pandas related operations that can be used in practice
A collection of commands frequently used in server management
Goroutine (parallel control) that can be used in the field
I tried to summarize the code often used in Pandas
Goroutine that can be used in the field (errgroup.Group edition)
[Django] A collection of scripts that are convenient for development
A collection of methods used when aggregating data with pandas
Grammar summary often used in pandas
pandas Fetch the name of a column that contains a specific character
A function that measures the processing time of a method in python
Import modules that are often used when starting the python interpreter
I tried to summarize the methods that are often used when implementing basic algo in Quantx Factory
Test & Debug Tips: Create a file of the specified size in Python
I made a mistake in fetching the hierarchy with MultiIndex of pandas
Processing memos often used in pandas (beginners)
Set the number of elements in a NumPy one-dimensional array to a power of 2 (0 padded)
[Complete memorandum] A collection of codes that I often use but cannot remember
Talking about the features that pandas and I were in charge of in the project
Find the index of items that match the conditions in the pandas data frame / series
Create a BOT that displays the number of infected people in the new corona
A memorandum of method often used when analyzing data with pandas (for beginners)
A memorandum of method often used in machine learning using scikit-learn (for beginners)
Get the caller of a function in Python
Summary of what was used in 100 Pandas knocks (# 1 ~ # 32)
Make a copy of the list in Python
Find the number of days in a month
Fix the argument of the function used in map
Output in the form of a python array
A magic word (?) That may save people who are addicted to building using the Intel compiler of Python + Numpy.
About the matter that the contents of Python print are not visible in docker logs
[Python] A program that finds the shortest number of steps in a game that crosses clouds
A memo that implements the job of loading a GCS file into BigQuery in Python
A solution to the problem that files containing [and] are not listed in glob.glob ()
A story that reduces the effort of operation / maintenance
[Python] A program that counts the number of valleys
Used from the introduction of Node.js in WSL environment
Make a BOT that shortens the URL of Discord
Make a note of the list of basic Pandas usage
# Function that returns the character code of a string
A shell program that becomes aho in multiples of 3
Generate that shape of the bottom of a PET bottle
Super simple: A collection of shells that output dates
A story that analyzed the delivery of Nico Nama.
A reminder about the implementation of recommendations in Python
[Python] A program that compares the positions of kangaroos.
A server that returns the number of people in front of the camera with bottle.py and OpenCV
Numpy creates a matrix with only the columns whose total values of the columns of the matrix are the top X
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
[Python] Programming to find the number of a in a character string that repeats a specified number of times.
[Note] A shell script that checks the CPU usage of a specific process in a while loop.
[Django] Field names, user registration, and login methods that can be used in the User model
Financial engineering verified the claim that "leveraged mutual funds are disadvantageous in a volatile market"
A tool that automatically turns the gacha of a social game
A note on the default behavior of collate_fn in PyTorch