[PYTHON] Data Science 100 Knock ~ Battle for less than beginners part2

This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

100 knock articles 100 Knock Guide

** Be careful if you are trying to do it as it includes spoilers **

This time from 10 to 18 First time with table of contents

From this time on, things I wrote are not always successful, and I am writing while looking at the answers and writing examples of failures.

10th

mine10.py


df=df_store
df[df['store_cd'].str.contains('S14')].head()

'''Model answer'''
df_store.query("store_cd.str.startswith('S14')", engine='python').head(10)

'''Failure example'''
import re
df=df_store
df[None != re.match(r'S14.*',str(df['store_cd']))]
#>(Omission)KeyError: False

** At first glance, I thought, "If you use LIKE with SQL, it's one shot !?" ** ** For the time being, I try to solve it by plunging into ~~ heart friend ~~ rematching. However, I noticed that it was not a character string type, and even if I matched the first character with this writing method, it did not return with None ~ ~ I kept causing KeyError in the first place ~ ~ I searched for another means.

Then, I found something called df.str.contains and was impressed. I used it while feeling invincible because I could write the inside with regular expressions, but I used the model answer ~~ and the Wakanowakaranai ~~ method. ʻEngine ='python'` seems to be a magical thing [reference]

11th

mine11.py


df=df_customer
df[df['customer_id'].str.contains('1$')].head()

'''Model answer'''
df_customer.query("customer_id.str.endswith('1')", engine='python').head(10)

A problem that continues to tell us the value of SQL. After writing the article to some extent and reviewing it, read the reference again and check the method below df.str.

12th to 15th

mine12.py


df= df_store
df=df[df['address'].str.contains('Yokohama')]
df

'''Model answer'''
df_store.query("address.str.contains('Yokohama')", engine='python')

mine13.py


df=df_customer
df=df[df['status_cd'].str.contains('^[A-F]')]
df.head(10)

'''Model answer'''
df_customer.query("status_cd.str.contains('^[A-F]', regex=True)", engine='python').head(10)

mine14.py


df=df_customer
df=df[df['status_cd'].str.contains('[0-9]$')]
df.head(10)

'''Model answer'''
df_customer.query("status_cd.str.contains('[1-9]$', regex=True)", engine='python').head(10)

mine15.py


df=df_customer
df=df[df['status_cd'].str.contains('^[A-F].*[0-9]$')]
df.head(10)

'''Model answer'''
df_customer.query("status_cd.str.contains('^[A-F].*[1-9]$', regex=True)", engine='python').head(10)

mine16.py


df=df_store
df=df[df['tel_no'].str.contains('[0-9]{3}-[0-9]{3}-[0-9]{4}')]
df.head(10)

'''Model answer'''
df_store.query("tel_no.str.contains('[0-9]{3}-[0-9]{3}-[0-9]{4}', regex=True)", engine='python')

I'm sorry I have no impression. I'm sorry I didn't use query. ~~ I'm scared because I don't throw an error while writing ~~

In addition, regex = True seems to mean" use regular expressions "[reference]

17th

mine17.py


df_customer.sort_values('birth_day', ascending=True).head(10)

Another pattern that comes suddenly

I forgot about this, so I wrote it while checking it. Does ʻascending mean the same as SQL ʻORDER BY ASC?

18th

mine17.py


df_customer.sort_values('birth_day', ascending=False).head(10)

The difference from SQL is that instead of using DESC, ʻascending = False`.

It was good so far. So far.

1 and 2 were reviews with a feeling of a digestive game, but from here on, the question of whether or not knowledge can be combined will increase. ~~ I don't have enough knowledge ~~ I had many difficult problems, so I want to proceed at a slightly slower pace. There are some parts to skip (21,22, etc.).

Recommended Posts

Data Science 100 Knock ~ Battle for less than beginners part3
Data Science 100 Knock ~ Battle for less than beginners part6
Data Science 100 Knock ~ Battle for less than beginners part2
Data Science 100 Knock ~ Battle for less than beginners part1
Data Science 100 Knock ~ Battle for less than beginners part9
Data Science 100 Knock ~ Battle for less than beginners part7
Data Science 100 Knock ~ Battle for less than beginners part4
Data Science 100 Knock ~ Battle for less than beginners part11
Data science 100 knocks ~ Battle for less than beginners part10
Data science 100 knocks ~ Battle for less than beginners part8
Data science 100 knock commentary (P021 ~ 040)
Data science 100 knock commentary (P061 ~ 080)
Data science 100 knock commentary (P041 ~ 060)
Data science 100 knock commentary (P081 ~ 100)
How to implement 100 data science knocks for data science beginners (for windows10 Home)
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
Time series data anomaly detection for beginners
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)
[Linux command] less command option list [Must-see for beginners]
For new students (Recommended efforts for Python beginners Part 1)
How to use data analysis tools for beginners
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Data science 100 knock (structured data processing) environment construction (Windows10)
Basics of pandas for beginners ② Understanding data overview