[PYTHON] Grouping csv and getting minimum value (pandas)

Content of this article

--Process csv files in several groups

Preface

In the process of my graduation research, I had to process a reasonably large csv file. Specifically, the work of grouping the rows according to the value of the id column and getting the row with the smallest distance column in each group. The method used at that time is described below.

Main subject

The code is as follows.

import pandas as pd

data = pd.read_csv("Path to the original data file")
df = pd.DataFrame(columns=data.columns)
###This will be the final output file. The column name is the same as data and is empty at this stage.

dic = {}

for name,group in data.groupby('id'):
   dic[name] = group

list = dic['id'].unique()

for i in list:
   k = dic[i]
   l = k['distance'].idxmin()
   m = data.iloc[l:l+1,:]
   df = df.append(m)

df.to_csv("The path of the directory you want to save")

I think there's more concise code ... It is useful in quite a variety of situations, such as creating a DataFrame that inherits the column name of the original data and getting the row number that minimizes the value of a certain column.

Recommended Posts

Grouping csv and getting minimum value (pandas)
[pandas] .csv file reading and display method
Load csv with pandas and play with Index
Read CSV and analyze with Pandas and Seaborn
Missing value pandas
Extract the maximum value with pandas and change that value
jupyter and pandas installation
pandas index and reindex
Read CSV file: pandas
pandas resample and rolling
Pandas averaging and listing