Data cleansing 1 Convenient Python notation such as lambda and map

First of all, from the greetings ...

Nice to meet you, this is Ngahope! Although I am a liberal arts student, I started attending the AI learning school "Aidemy" this week because I was interested in the possibilities of AI. There, I am learning various knowledge about AI. I wanted to share this knowledge with you, so I decided to put it together on Qiita.

What to learn this time ・ One line notation ・ Split display of list ・ Processing performed on the dictionary

1. One-line notation

lambda(1) -A function defined by def and returned by return can be expressed in one line by using lambda.

__ · lambda Argument: Return value __

# x*Define a function a that returns 4
a=lambda x: x*4
# 16

lambda (2) When there are two or more arguments

-Just separate the arguments with ","

a=lambda x,y: x+y
# 9

lambda (3) Description of return function including if

・ Write if else side by side __ · lambda Argument: Process (True) if condition else Other process __

a=lambda x: x*4 if x>4 else x=4
# 24

2. Split display of list

Split display (1)

-Use the split function when you want to split by a certain symbol (one type). __ · String.split ("symbol") __

text="my name is ngayope"
text.split(" ") # " (space)"Separated by
# ['my','name','is','ngayope']

Split display (2)

-Use the re.split function when you want to split by multiple symbols (re must be imported to use) __ ・ re.split ("[symbol]", character string) __

import re
# ['Bulbasaur','Charmander','Squirtle']

When you want to use a function for each element of the list

-When using a function for each element of the list, use the map function. Such a function is called an "iterator". __ ・ list (map (function, list)) __ * If you do not enclose it in list (), the function application result will not be reflected!

import re
time_list =["2006/11/26_2:40","2009/1/16_23:35","2014/5/4_14:26","2017/8/9_7:5","2017/4/1_22:15"]

# time_Function to extract "time" from list
hour_pick = lambda x: int(re.split("[/_:]",x)[3])
#↑ First re.Split the string with split. Then int this string()Convert to a number with. Finally[3]Display the function to extract the "hour" part with lambda in one line.

#time_Apply the function to list and return the result.
# [2,23,14,7,22]

Extract only the elements that satisfy the condition (authenticity judgment function) from each element in the list

-It can be extracted by using the filter function in the same way as map. __ · list (filter (authenticity judgment function, list)) __

#Above time_Function that sets True only for the list whose "month" is after July
judge=lambda a: int(re.split("[/_:]",a)[1] >6
# ["2006/11/26_2:40","2017/8/9_7:5"]

Sort by specifying the sorting criteria with a function

-Use the sorted function instead of the sort function. __ · sorted (list, key = reference function, reverse = True (descending order)) __

#Sort based on the second element (descending order)
sorted(list1,key=lambda x: x[1],reverse=False)
# [[0,7][1,6][5,5][2,3][4,1][9,0]]

Write for or if in the list

Description of for in the list

-Even if you do the following, you can apply the function to all the elements in the same way as the map function. __ · [Function for Variables in List] __

#[◯m,◯cm]Function to calculate like
m_cm=lambda x:[x//100,x%100]
print([m_cm(x) for x in cm])

Description of for with conditional (if) in the list

-Even if you do the following, you can extract only the elements that satisfy the conditions like the filter function. __ · [Element for Variable in List if Authenticity Function] __

#Extract only "things over 1m" in the upper cm
[x for x in cm if x>=100]
# [100,500,380]

Loop through multiple lists at the same time

-Even if it is a separate list, one loop processing can be performed using the zip function. __ · for variables 1,2 in zip (Listing 1,2): __ processing __ ・ [Processing for variables 1, 2 in zip (List 1, 2)] __

a = [1,-2,3,-4,5]
b = [9,8,-7,-6,-5]
#For each element of the list, x*4+y*Do 2
[x*4+y*2 for x,y in zip(a, b)]
# [22,8,-2,-28,10]

Do more loops inside the loop

・ When looping one of the two lists and looping the other -Set a for statement in each of the two lists and describe the processing for the two variables. -Use the following method for in-list processing. ** [[Process] for Variable 1 in List 1 for Variable 2 in List 2] **

print([[x+y]for x in a for y in b])
# [6,7,7,8]

Processing to be performed on the dictionary

Count the number of elements and output as a dictionary

-Normally, it is necessary to initialize the contents of the dictionary one by one, but it can be easily added by using the defaultdict class instead of the dictionary. ** ・ defaultdict (element type (int, dir, etc.)) **

from collections import defaultdict

#Extract each element of the list as a key and increase the number of keys that appear by +1
for key in lst:
__d[key] += 1
# defaultdict(<class='int'>,{'Bulbasaur':1,'Charmander':2,'Squirtle':1})

Add element in value of list type dictionary

-You can easily add elements by using defaultdict. ** ・ defaultdict dictionary [key] .append (element) **

#list type dictionary
#index of a[0]To x, index[1]Is assigned to y and taken out
for x,y in a:

Enumeration more easily

-By using the Counter class, it is easier to count than the default dict. ** ・ Counter (counting data) ** -By setting ".most_common (number of elements)", sort in descending order and output the specified number of elements.

from collections import Counter
# Counter({'Charmander':2})


-By using lambda, the function is multiplied by one line. ・ Split and re. You can split the list with split. -If you write a for statement and a function in the list [], you can easily apply the function to the elements of the list. -By using defaultdict and Counter, you can easily count the elements of the dictionary.

This time is over. Thank you for reading this far.

