[Introduction to Python] How to delete rows that meet multiple conditions in Pandas.DataFrame

Hello.

Recently, I'm also touching ** Python **.

Now, in order to improve the efficiency of internal operations, we are trying to automate the payment application process that accounting manually performs using Excel and Csv using ** Python **.

I'm using ** openpyxl ** for Excel operation and ** pandas ** for data processing, but I'm still struggling with ** pandas **, and if it's ** C # **, the code will be killed instantly. I'm also struggling with the process of assembling.

I will write down such a struggle instead of a memo.

Click here for this memo.

How to delete rows that meet multiple conditions in Pandas.DataFrame

There may be a tsukkomi saying, "It's that easy!", But I've tried various things, so I'll make a note of it.

When I looked up how to delete a row in a DataFrame, I found a function called ** drop **. [Official] pandas.DataFrame.drop

The usage is as follows. Specify the Index number of the DataFrame and delete it.

import pandas as pd

#Read an Excel file
data_frame = pd.read_excel('{Excel file path}', sheet_name='{Sheet name}', header=0)

#10th line(Index=9)To delete
data_frame.drop(9)
#6th line(Index=5)And line 8(Index=7)To delete
data_frame.drop([5,7])
#2nd line(Index=1)5th line from(Index=4)To delete
data_frame.drop(range(1,4))

This time, as the title suggests, "** Delete data whose business partner starts with XX and description starts with △△ **" and delete lines that satisfy multiple conditions.

For that purpose, it is necessary to obtain the Index number that satisfies the conditions.

When I looked up how to get the Index number of a DataFrame, I found a function called ** index **. [Official] pandas.DataFrame.index

The usage is as follows.

#Get Index that matches the condition
drop_index = data_frame.index[data_frame['Suppliers'] == '〇〇']
#Delete Index that matches the condition
data_frame = data_frame.drop(drop_index)

Yeah, I was able to properly delete the data of "** Customers are 〇〇 **". Next, let's increase the conditions.

#Get Index that matches multiple conditions
drop_index = data_frame.index[(data_frame['Suppliers'] == '〇〇') & (data_frame['Description'] == '△△*****')]
#Delete Index that matches the condition
data_frame = data_frame.drop(drop_index)

Alright, the data of the two conditions can be deleted!  : Hmm? I forgot something ... The second condition was a prefix match ... I don't know ...

For the time being, try the prefix match ** starts with **.

#Try prefix match with starts with DataFrame ...
drop_index = data_frame.index[(data_frame['Suppliers'] == '〇〇') & (data_frame['Description'].startswith('△△'))]

【result】
Error:'Series' object has no attribute 'startswith'

"** Series ** doesn't have ** starts with **!" Was angry.

Pandas has an accessor (access method) that can process strings called ** str **, and also has a ** startswith ** function that can be extracted by prefix match. [Official] pandas.Series.str.startswith

Try it with ** str.startswith **.

#Added prefix match condition with startswith of str accessor of DataFrame
drop_index = data_frame.index[(data_frame['Suppliers'] == '〇〇') & (data_frame['Description'].str.startswith('△△'))]
#Delete Index that matches the condition
data_frame = data_frame.drop(drop_index)

well done! I was able to delete the data of two conditions properly!

I would like to say, "That's it!" By the time we got to the above method, we actually implemented it in a different way.

#Loop up to the maximum number of rows in the DataFrame
for i in range(len(data_frame)):
     #Chuck with specified column and row values
     if data_frame['Suppliers'][i] == '〇〇' and data_frame['Description'][i].startswith('△△'):
          #Delete the Index number that matches the condition
          data_frame = data_frame.drop(i)

It's a bit analog, but it loops through the number of rows of data and processes each one. This method can also delete the data of two conditions properly.

There are many other methods, but this time I introduced the method I actually tried. That's it.

Next time, I'll post where I struggled with ** openpyxl ** and ** pandas **.

Thank you for reading this far.

Recommended Posts

[Introduction to Python] How to delete rows that meet multiple conditions in Pandas.DataFrame
[Introduction to Python] How to use class in Python?
How to delete multiple specified positions (indexes) in a Python list
Delete vertices that meet the conditions in networkx
How to write string concatenation in multiple lines in Python
How to retrieve multiple arrays using slice in python.
Extract only elements that meet specific conditions in Python
How to develop in Python
How to test that Exception is raised in python unittest
How to slice a block multiple array from a multiple array in Python
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
How to define multiple variables in a python for statement
[Python] How to do PCA in Python
How to collect images in Python
[Introduction to Python] How to parse JSON
[Python] How to delete rows and columns in a table (list of drop method options)
How to use Mysql in python
How to wrap C in Python
How to use ChemSpider in Python
How to use PubChem in Python
Delete multiple elements in python list
How to handle Japanese in Python
How to judge that the cross key is input in Python3
[python] Move files that meet the conditions
How to access environment variables in Python
How to dynamically define variables in Python
How to do R chartr () in Python
How to delete expired sessions in Django
[Itertools.permutations] How to put permutations in Python
Send email to multiple recipients in Python (Python 3)
How to get a stacktrace in python
How to display multiplication table in python
How to extract polygon area in Python
How to check opencv version in python
How to switch python versions in cloud9
How to adjust image contrast in Python
How to use __slots__ in Python class
How to dynamically zero pad in Python
How to use regular expressions in Python
How to display Hello world in python
How to use is and == in Python
How to write Ruby to_s in Python
Introduction to Vectors: Linear Algebra in Python <1>
Introduction to Effectiveness Verification Chapter 1 in Python
How to write a string when there are multiple lines in python
How to use the C library in Python
How to receive command line arguments in Python
[REAPER] How to play with Reascript in Python
How to clear tuples in a list (Python)
[Introduction to Udemy Python3 + Application] 23. How to use tuples
How to generate permutations in Python and C ++
How to embed a variable in a python string
How to take multiple arguments when doing parallel processing using multiprocessing in python
How to implement Discord Slash Command in Python
Introduction to effectiveness verification Chapter 3 written in Python
How to plot multiple fits images side by side in galactic coordinates using python
tse --Introduction to Text Stream Editor in Python
How to use Python Image Library in python3 series
How to create a JSON file in Python
[Python] How to draw multiple graphs with Matplotlib
I wrote "Introduction to Effect Verification" in Python