[Python] 100 knocks on data science (structured data processing) 015 Explanation

Youtube Video commentary is also available.

problem

P-015: From the customer data frame (df_customer), extract all the data whose status code (status_cd) starts with the letters A to F and ends with the numbers 1 to 9, and display only 10 items.

answer

code


df_customer.query("status_cd.str.contains('^[A-F].*[1-9]$')", engine='python').head(10)

output

customer_id customer_name gender_cd gender birth_day age postal_cd address application_store_cd application_date status_cd
12 CS011215000048 Saya Ashida 1 Female 1992-02-01 27 223-0062 Hiyoshihoncho, Kohoku Ward, Yokohama City, Kanagawa Prefecture********** S14011 20150228 C-20100421-9
68 CS022513000105 Kimiko Shimamura 1 Female 1962-03-12 57 249-0002 Yamanone, Zushi City, Kanagawa Prefecture********** S14022 20150320 A-20091115-7
71 CS001515000096 Yoko Mizuno 9 unknown 1960-11-29 58 144-0053 Kamatahoncho, Ota-ku, Tokyo********** S13001 20150614 A-20100724-7
122 CS013615000053 Nishiwaki Kii 1 Female 1953-10-18 65 261-0026 Makuharinishi, Mihama Ward, Chiba City, Chiba Prefecture********** S12013 20150128 B-20100329-6
144 CS020412000161 Kaoru Komiya 1 Female 1974-05-21 44 174-0042 Higashisakashita, Itabashi-ku, Tokyo********** S13020 20150822 B-20081021-3
178 CS001215000097 Asami Takenaka 1 Female 1990-07-25 28 146-0095 Tamagawa, Ota-ku, Tokyo********** S13001 20170315 A-20100211-2
252 CS035212000007 Erika Uchimura 1 Female 1990-12-04 28 152-0023 Yakumo, Meguro-ku, Tokyo********** S13035 20151013 B-20101018-6
259 CS002515000386 Ko Noda 1 Female 1963-05-30 55 185-0013 Nishikoigakubo, Kokubunji-shi, Tokyo********** S13002 20160410 C-20100127-8
293 CS001615000372 Inagaki Suzuka 1 Female 1956-10-29 62 144-0035 Minamikamata, Ota-ku, Tokyo********** S13001 20170403 A-20100104-1
297 CS032512000121 Tomoyo Matsui 1 Female 1962-09-04 56 210-0011 Fujimi, Kawasaki Ward, Kawasaki City, Kanagawa Prefecture********** S13032 20150727 A-20100103-5

Commentary

** ・ It is a method to check the first data that meets the conditions in Pandas DataFrame / Series. -Use when you want to check the information that meets the conditions. -'Contains ()' is a function that determines whether the specified character string is included. If it is included, it returns True, and if it is not included, it returns False. -However,'.query ('--- .str.contains ())' is specified on condition that the specified character string is included. ** ** ** ・ In this case, set'status_cd.str' to replace status_cd with a character string, and continue with'.contains ('^ [AF]. * [1-9] ')' "Specifies status_cd with" 1-9 "at the beginning and" 1-9 "at the end. ** ** ** ・'.' Is a regular expression that represents one character, and'*' is a regular expression that represents that the previous character is repeated 0 or more times. A regular expression is a "method of representing multiple strings with a single symbol". ** ** **'. *'Indicates that 0 or more characters are inserted in between. ** ** **-A regular expression that indicates that'^'is the first character and'' is the last character. ** **

** * For regular expressions, this article will be helpful. ** ** https://qiita.com/hiroyuki_mrp/items/29e87bf5fe46de62983c

** ·'regex = True' was needed when working with regular expressions. In the example solution,'^'' $''-'' ʻ*' is treated as a regular expression. Currently, it is treated as a regular expression without writing it, so there is no problem even if it is not written. ** ** ** ・ For'engine ='python'', you can select'python' or'numexpr' for engine which is the argument of query, but when using str, specify'python'. If you do not give it, an error will occur. ** **

Recommended Posts

[Python] 100 knocks on data science (structured data processing) 018 Explanation
[Python] 100 knocks on data science (structured data processing) 023 Explanation
[Python] 100 knocks on data science (structured data processing) 030 Explanation
[Python] 100 knocks on data science (structured data processing) 022 Explanation
[Python] 100 knocks on data science (structured data processing) 017 Explanation
[Python] 100 knocks on data science (structured data processing) 026 Explanation
[Python] 100 knocks on data science (structured data processing) 016 Explanation
[Python] 100 knocks on data science (structured data processing) 027 Explanation
[Python] 100 knocks on data science (structured data processing) 029 Explanation
[Python] 100 knocks on data science (structured data processing) 015 Explanation
[Python] 100 knocks on data science (structured data processing) 028 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 001-010 Impressions + Explanation Link Summary
Try "100 knocks on data science" ①
Getting started with Python with 100 knocks on language processing
Challenge 100 data science knocks
Data science 100 knock (structured data processing) environment construction (Windows10)
Data Science Cheat Sheet (Python)
[Python] Notes on data analysis
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 2]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 1]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 5]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 4]
That's why I quit pandas [Data Science 100 Knock (Structured Data Processing) # 6]
Start data science on the cloud
Image processing with Python 100 knocks # 3 Binarization
Image processing with Python 100 knocks # 2 Grayscale
Image processing with Python 100 knocks # 8 Max pooling
I took Udemy's "Practical Python Data Science"
[Python] Various data processing using Numpy arrays
Image processing with Python 100 knocks # 7 Average pooling
Video processing using Python + OpenCV on Mac
Image processing with Python 100 knocks # 9 Gaussian filter
Books on data science to read in 2020
Periodically execute Python Script on AWS Data Pipeline
Folium: Visualize data on a map with Python
[Data science memorandum] Handling of missing values ​​[python]
Try importing MLB data on Mac and Python
TensorFlow: Run data learned in Python on Android
100 language processing knocks 03 ~ 05
100 language processing knocks (2020): 40
100 language processing knocks (2020): 32
[Python] Challenge 100 knocks! (015 ~ 019)
100 language processing knocks (2020): 35
python image processing
100 language processing knocks (2020): 47
100 language processing knocks (2020): 39
Python on Windows
twitter on python3
100 language processing knocks (2020): 22
100 language processing knocks (2020): 26
100 language processing knocks (2020): 34