[PYTHON] I tried to summarize the code often used in Pandas

As I started kaggle and got more and more exposed to data science, I inevitably used pandas to process data because I use python. This time, I've summarized the codes that I often use personally. It's almost a memo for myself, but I thought it might be useful for someone, so I decided to post it all together on qiita. If you have any advice or impressions, such as other better notations, please let us know in the comments. Also, if there is general-purpose code, I would like to update it from time to time.

DataFrame How to create data. There is nothing particularly good about it, but the same data is created in two ways. Please use the one that is easy for you depending on the situation. The output is the same. method 1

index = ['a','b','c']
columns = ['A','B','C']
inputs = [[1,2,1],[3,4,3],[5,6,5]]
df = pd.DataFrame(columns = columns,index = index)
for i,columns in enumerate(columns):
    df[columns] = inputs[i]
df
A B C
a 1 3 5
b 2 4 6
c 1 3 5

method 2

index = ['a','b','c']
df = pd.DataFrame({
    'A':[1,2,1],
    'B':[3,4,3],
    'C':[5,6,5]},
    index=index)
df
A B C
a 1 3 5
b 2 4 6
c 1 3 5

This time, we put an appropriate alphabet (a, b, c) as index, but if you do not specify index, it will assign a number from 0.

Feature Encoding Some summary about feature conversion. One-Hot Encoding I think that there are many situations where you want to convert to one hot vector when you are playing with the data. You can use sklearn's One-Hot Encoding, but if you manage your data with pandas, get_dummes is more efficient.

pd.get_dummies(df['A'])
1 2
a 1 0
b 0 1
c 1 0

Frequency Encoding This is completely personal code. I thought I might use it again, so make a note of it. The process is to convert the value to a label for the number of occurrences and return it.

df.groupby('B')[['B']].transform('count')
B
a 2
b 1
c 2

It means that 3 appears twice and 4 appears once in B columns.

I haven't put it all together yet, but for now. I will add code again.

Recommended Posts

I tried to summarize the code often used in Pandas
I tried to summarize the commands often used in business
I tried to summarize how to use pandas in python
I tried to summarize the methods that are often used when implementing basic algo in Quantx Factory
I tried to summarize the umask command
I tried to summarize the graphical modeling.
I tried to summarize the commands used by beginner engineers today
I tried to summarize the frequently used implementation method of pytest-mock
LeetCode I tried to summarize the simple ones
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to summarize the Linux commands used by beginner engineers today-Part 1-
I tried to summarize SparseMatrix
I tried to graph the packages installed in Python
I tried to summarize the basic form of GPLVM
I tried to summarize the string operations of Python
I tried to organize the evaluation indexes used in machine learning (regression model)
I tried to summarize the operations that are likely to be used with numpy-stl
I tried to summarize all the Python plots used in the research by active science graduate students [Basic]
I tried porting the code written for TensorFlow to Theano
[First COTOHA API] I tried to summarize the old story
Grammar summary often used in pandas
I tried to illustrate the time and time in C language
I tried to summarize the new coronavirus infected people in Ichikawa City, Chiba Prefecture
I tried to implement the mail sending function in Python
[Machine learning] I tried to summarize the theory of Adaboost
I tried to summarize all the Python visualization tools used in research by active science graduate students [Application]
I tried to move the ball
I tried to estimate the interval.
I tried to summarize how to use the EPEL repository again
[No code] I wrote about elliptic curves and blockchain in my thesis, so I tried to summarize the study method.
I tried to summarize what python strong people are doing in the competition professional neighborhood
I tried to describe the traffic in real time with WebSocket
[Linux] I tried to summarize the command of resource confirmation system
I tried to process the image in "sketch style" with OpenCV
I wrote the code to write the code of Brainf * ck in python
I tried to process the image in "pencil style" with OpenCV
I want to make the second line the column name in pandas
I tried to summarize Python exception handling
I tried to implement PLSA in Python
I tried to implement permutation in Python
I tried to recognize the wake word
Python3 standard input I tried to summarize
I tried to implement ADALINE in Python
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to implement PPO in Python
Processing memos often used in pandas (beginners)
I tried to summarize Ansible modules-Linux edition
I tried to summarize the contents of each package saved by Python pip in one line
I tried to summarize until I quit the bank and became an engineer
I tried to summarize the general flow up to service creation by self-education.
I tried to get the authentication code of Qiita API with Python.
I tried to summarize Cpaw Level1 & Level2 Write Up in an easy-to-understand manner
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried to summarize the logical way of thinking about object orientation.
I tried to summarize Cpaw Level 3 Write Up in an easy-to-understand manner
I tried to display the altitude value of DTM in a graph
I implemented the VGG16 model in Keras and tried to identify CIFAR10
I tried to train the RWA (Recurrent Weighted Average) model in Keras
I tried to put PyCharm in Ubuntu 16.04 LTS (PPA cannot be used)
I tried web scraping to analyze the lyrics.