Data cleansing 1 Convenient Python notation such as lambda and map

Aidemy https:// aidemy.net 2020/9/21

First of all, from the greetings ...

Nice to meet you, this is Ngahope! Although I am a liberal arts student, I started attending the AI learning school "Aidemy" this week because I was interested in the possibilities of AI. There, I am learning various knowledge about AI. I wanted to share this knowledge with you, so I decided to put it together on Qiita.

Please read it!

What to learn this time ・ One line notation ・ Split display of list ・ Processing performed on the dictionary

1. One-line notation

lambda(1) -A function defined by def and returned by return can be expressed in one line by using lambda.

__ · lambda Argument: Return value __

# x*Define a function a that returns 4
a=lambda x: x*4
print(a(4))
# 16

lambda (2) When there are two or more arguments

-Just separate the arguments with ","

a=lambda x,y: x+y
print(a(3,6))
# 9

lambda (3) Description of return function including if

・ Write if else side by side __ · lambda Argument: Process (True) if condition else Other process __

a=lambda x: x*4 if x>4 else x=4
print(a(6))
# 24

2. Split display of list

Split display (1)

-Use the split function when you want to split by a certain symbol (one type). __ · String.split ("symbol") __

text="my name is ngayope"
text.split(" ") # " (space)"Separated by
# ['my','name','is','ngayope']

Split display (2)

-Use the re.split function when you want to split by multiple symbols (re must be imported to use) __ ・ re.split ("[symbol]", character string) __

import re
text="Bulbasaur,Charmander.Squirtle"
re.split("[,.]",text)
# ['Bulbasaur','Charmander','Squirtle']

When you want to use a function for each element of the list

-When using a function for each element of the list, use the map function. Such a function is called an "iterator". __ ・ list (map (function, list)) __ * If you do not enclose it in list (), the function application result will not be reflected!

import re
time_list =["2006/11/26_2:40","2009/1/16_23:35","2014/5/4_14:26","2017/8/9_7:5","2017/4/1_22:15"]

# time_Function to extract "time" from list
hour_pick = lambda x: int(re.split("[/_:]",x)[3])
#↑ First re.Split the string with split. Then int this string()Convert to a number with. Finally[3]Display the function to extract the "hour" part with lambda in one line.

#time_Apply the function to list and return the result.
list(map(hour_pick,time_list))
# [2,23,14,7,22]

Extract only the elements that satisfy the condition (authenticity judgment function) from each element in the list

-It can be extracted by using the filter function in the same way as map. __ · list (filter (authenticity judgment function, list)) __

#Above time_Function that sets True only for the list whose "month" is after July
judge=lambda a: int(re.split("[/_:]",a)[1] >6
list(filter(judge,time_list))
# ["2006/11/26_2:40","2017/8/9_7:5"]

Sort by specifying the sorting criteria with a function

-Use the sorted function instead of the sort function. __ · sorted (list, key = reference function, reverse = True (descending order)) __

list1=[[2,3][4,1][5,5][9,0][0,7][1,6]]
#Sort based on the second element (descending order)
sorted(list1,key=lambda x: x[1],reverse=False)
# [[0,7][1,6][5,5][2,3][4,1][9,0]]

Write for or if in the list

Description of for in the list

-Even if you do the following, you can apply the function to all the elements in the same way as the map function. __ · [Function for Variables in List] __

cm=[100,50,500,3,380]
#[◯m,◯cm]Function to calculate like
m_cm=lambda x:[x//100,x%100]
print([m_cm(x) for x in cm])
#[[1,0],[0,50],[5,0],[0,3],[3,80]]

Description of for with conditional (if) in the list

-Even if you do the following, you can extract only the elements that satisfy the conditions like the filter function. __ · [Element for Variable in List if Authenticity Function] __

#Extract only "things over 1m" in the upper cm
[x for x in cm if x>=100]
# [100,500,380]

Loop through multiple lists at the same time

-Even if it is a separate list, one loop processing can be performed using the zip function. __ · for variables 1,2 in zip (Listing 1,2): __ processing __ ・ [Processing for variables 1, 2 in zip (List 1, 2)] __

a = [1,-2,3,-4,5]
b = [9,8,-7,-6,-5]
#For each element of the list, x*4+y*Do 2
[x*4+y*2 for x,y in zip(a, b)]
# [22,8,-2,-28,10]

Do more loops inside the loop

・ When looping one of the two lists and looping the other -Set a for statement in each of the two lists and describe the processing for the two variables. -Use the following method for in-list processing. ** [[Process] for Variable 1 in List 1 for Variable 2 in List 2] **

a=[1,2]
b=[5,6]
print([[x+y]for x in a for y in b])
# [6,7,7,8]

Processing to be performed on the dictionary

Count the number of elements and output as a dictionary

-Normally, it is necessary to initialize the contents of the dictionary one by one, but it can be easily added by using the defaultdict class instead of the dictionary. ** ・ defaultdict (element type (int, dir, etc.)) **

from collections import defaultdict

d=defaultdict(int)
lst=['Bulbasaur','Charmander','Squirtle','Charmander']
#Extract each element of the list as a key and increase the number of keys that appear by +1
for key in lst:
__d[key] += 1
print(d)
# defaultdict(<class='int'>,{'Bulbasaur':1,'Charmander':2,'Squirtle':1})

Add element in value of list type dictionary

-You can easily add elements by using defaultdict. ** ・ defaultdict dictionary [key] .append (element) **

#list type dictionary
a=[('Charmander',5),('Lizard',16),('Lizardン',36),('Lizardン',100)]
d=defaultdict(list)
#index of a[0]To x, index[1]Is assigned to y and taken out
for x,y in a:
__d[x].append(y)
print(d)
#defaultdict(<class='list'>,{'Charmander':[5],'Lizard':[16],'Lizardン':[36,100]})

Enumeration more easily

-By using the Counter class, it is easier to count than the default dict. ** ・ Counter (counting data) ** -By setting ".most_common (number of elements)", sort in descending order and output the specified number of elements.

from collections import Counter
lst=['Bulbasaur','Charmander','Squirtle','Charmander']
print(Counter(lst).most_common(1))
# Counter({'Charmander':2})

Summary

-By using lambda, the function is multiplied by one line. ・ Split and re. You can split the list with split. -If you write a for statement and a function in the list [], you can easily apply the function to the elements of the list. -By using defaultdict and Counter, you can easily count the elements of the dictionary.

This time is over. Thank you for reading this far.

Recommended Posts

Data cleansing 1 Convenient Python notation such as lambda and map
Python Application: Data Cleansing Part 1: Python Notation
python string processing map and lambda
Ruby, Python and map
Python application: Data cleansing # 3: Use of OpenCV and preprocessing of image data
About Python, from and import, as
Hashing data in R and Python
Python application: Data cleansing # 2: Data cleansing with DataFrame
Send and receive image data as JSON over the network with Python
Data pipeline construction with Python and Luigi
Optimization such as interpolation and curve fitting
Python data structure and internal implementation ~ List ~
Manipulate DynamoDB data with Lambda (Node & Python)
Python data structure and operation (Python learning memo ③)
Easily graph data in shell and Python
Compress python data and write to sqlite
Exchange encrypted data between Python and C #
[Python] Get product information such as ASIN and JAN with Amazon PA-API ver5.0