[PYTHON] Manipulating strings with pandas group by

Overview

You can often find examples of getting the average, minimum, and maximum values in pandas, I often created groups and processed them, so I summarized my memorandum instead. ~~ I feel like I'm going to get stuck in what number of decoctions ... ~~

Things necessary

python 3.7.2
pandas
numpy

I am using Jupyter Notebook to check the operation.

Processing content

The data used is the data of adverse events of JADER.


import pandas as pd
import numpy as np
reacs=pd.read_csv('reac.csv',dtype='str',encoding='shift-jisx0213')

First, group by ** identification number ** so that each case is unique

groupCaseNo=reacs.groupby('Identification number')

Since it is grouped by identification number, you can get the grouped keys by using groups as shown below.

groupCaseNo.groups.keys()

Processing can be performed for each key by doing the following. The contents of get_group can be obtained by using the grouping key.

for case in groupCaseNo.groups.keys():
    print(groupCaseNo.get_group(case))

It is possible to combine strings using a function by using ʻapply` as shown below. Anonymous functions are possible using lambda, but I think you'll have to create a separate function when doing complicated things.

def getRecordAe(data):
    return data.Harmful event serial number+':'+data.Adverse event

groupCaseNo.apply(getRecordAe)