[PYTHON] Data Science 100 Knock ~ Battle for less than beginners part9

This is a struggle record of knocking 100 eggs without knowing the data scientist's egg. It is a mystery whether I can finish the race. ~~ Even if it disappears on the way, please think that it is not given to Qiita. ~~

100 knock articles 100 Knock Guide

** Be careful if you are trying to do it as it includes spoilers **

Other Delayed

This is hard to see! This way of writing is dangerous! If you have any questions, please let me know. ~~ I will use it as food while suffering damage to my heart.

This solution is wrong! This interpretation is different! Please comment if you have any.

This time from 45 to 51. [Last time] 41-44 [First time with table of contents]

45th

From here, the problem of converting date type-character type-number type will increase. Actually, other When I was doing it, I was doing a little tinkering with the date type. Was good. The derailment was not in vain.

Information I want even if it is the same site at the end this time → I will summarize the pages that I referred to. The page jumps too much and I really don't understand.

P-045: The date of birth (birth_day) of the customer data frame (df_customer) holds the data in date type (Date). Convert this to a character string in YYYYMMDD format and extract it together with the customer ID (customer_id). Only 10 data can be extracted.

mine45.py


df=df_customer.copy()
df['birth_day']=pd.to_datetime(df['birth_day']).dt.strftime('%Y%m%d')
df[['customer_id','birth_day']].head(10)

'''Model answer'''
pd.concat([df_customer['customer_id'],
           pd.to_datetime(df_customer['birth_day']).dt.strftime('%Y%m%d')],
          axis = 1).head(10)

I added .copy () because it was a bug while I was doing the problem. If you have to deepcopy, change it again.

This time, I used .dt.strftime () (Reference). As a person who used printf a lot in C language, I like the format specification quite a lot.

However, for some reason, if you connect .dt.strftime directly todf ['birth_day'], an error will occur. So if you specify it with pd.to_datetime (), it will work (?)

typeCheck.py


type(df['birth_day'])
type(pd.to_datetime(df['birth_day']))

Which also

pandas.core.series.Series

What is it?

46th

P-046: The application date (application_date) of the customer data frame (df_customer) holds the data in the character string type in YYYYMMD format. Convert this to a date type (date or datetime) and extract it along with the customer ID (customer_id). Only 10 data can be extracted.

mine46.py


df=df_customer.copy()
df['application_date']=pd.to_datetime(df['application_date'])
df[['customer_id','application_date']].head()
#df['application_date'].describe
#df['application_date'].apply(lambda x:x.year)

'''Model answer'''
pd.concat([df_customer['customer_id'],pd.to_datetime(df_customer['application_date'])], axis=1).head(10)

Conversion from string type to date type https://note.nkmk.me/python-pandas-datetime-timestamp/ I'm trying to see if the conversion is successful on the way.

To retrieve elements using .apply and lambda here

47th

P-047: The sales date (sales_ymd) of the receipt detail data frame (df_receipt) holds the data in the numeric type of YYYYMMDD format. Convert this to a date type (date or datetime) and extract it together with the receipt number (receipt_no) and receipt sub number (receipt_sub_no). Only 10 data can be extracted.

mine47.py


df=df_receipt.copy()
df['sales_ymd']=pd.to_datetime(df['sales_ymd'].astype(str))
df[['receipt_no','receipt_sub_no','sales_ymd']].head(10)

'''Model answer'''
pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],
           pd.to_datetime(df_receipt['sales_ymd'].astype('str'))],axis=1).head(10)

It is converted from numbers → letters → date type. Type conversion from numbers to strings

48th

There was a part of this problem that I couldn't understand even if I investigated only one place.

P-048: The sales epoch seconds (sales_epoch) of the receipt detail data frame (df_receipt) holds the data in numeric UNIX seconds. Convert this to a date type (date or datetime) and extract it together with the receipt number (receipt_no) and receipt sub number (receipt_sub_no). Only 10 data can be extracted.

mine48.py


df=df_receipt.copy()
df['sales_epoch']=pd.to_datetime(df['sales_epoch'],unit='s')
df[['receipt_no','receipt_sub_no','sales_epoch']].head(10)

'''Model answer'''
pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],
           pd.to_datetime(df_receipt['sales_epoch'], unit='s')],axis=1).head(10)

First, read the article What is Epoch Second, and at first, he said, "This, you can just convert the int to time." I tried it, but I couldn't do it, and after that I tried various things, but gave up. I saw the answer.

Then, what came out was to_datetime ('epok', unit ='s') I did not know the argument ʻunit`, but I searched it, but it is not in usual site, so I can not help it, so I went to the pandas reference page opened.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

unitstr, default ‘ns’ The unit of the arg (D,s,ms,us,ns) denote the unit, which is an integer or float number. This will be based off the origin. Example, with unit=’ms’ and origin=’unix’ (the default), this would calculate the number of milliseconds to the unix epoch start.

I don't understand English aside. I somehow understand that it accepts D, s, ms, us, ns as arguments, but I have no idea how s works. If anyone knows, please let me know as it may be a reference page. .. .. (Maybe the page you are reading is not a reference ...?)

49-5 1st

It ’s a similar problem, so I ’ll go all at once.

P-049: Convert the sales epoch seconds (sales_epoch) of the receipt detail data frame (df_receipt) to the date type (timestamp type), extract only the "year" and extract it together with the receipt number (receipt_no) and receipt sub number (receipt_sub_no). Let's do it. Only 10 data can be extracted.

P-050: Convert the sales epoch seconds (sales_epoch) of the receipt detail data frame (df_receipt) to the date type (timestamp type), extract only the "month" and extract it together with the receipt number (receipt_no) and receipt sub number (receipt_sub_no). Let's do it. In addition, "month" should be extracted with 0 padding and 2 digits. Only 10 data can be extracted.

P-051: Convert the sales epoch seconds (sales_epoch) of the receipt detail data frame (df_receipt) to the date type (timestamp type), extract only the "day" and extract it together with the receipt number (receipt_no) and receipt sub number (receipt_sub_no). Let's do it. In addition, "day" should be extracted with 0 padding and 2 digits. Only 10 data can be extracted.

mine49.py


df=df_receipt.copy()
df['sales_epoch']=pd.to_datetime(df['sales_epoch'],unit='s')
df['sales_epoch']=df['sales_epoch'].dt.strftime('%Y')

df[['receipt_no','receipt_sub_no','sales_epoch']].head(10)

'''Model answer'''
pd.concat([df_receipt[['receipt_no', 'receipt_sub_no']],
           pd.to_datetime(df_receipt['sales_epoch'], unit='s').dt.strftime('%Y')],axis=1).head(10)

# %Y%m %If you change it to d, you can get the month and day.

49-51 can be done by extracting epoch seconds → date type → character string type

However, to specify 50,51

In addition, "Month (Sun)" should be extracted with 0 padding and 2 digits.

To be

'''Trap (Dobon)'''
pd.to_datetime(df['sales_epoch'], unit='s').dt.year

If you do, the year will come out in 4 digits without any problem, but the month and day will not be filled with 0. (On the contrary, I did % 02d at first, but even if I didn't do it, it was filled with 0)

Up to here for this time

Reference page

--Type conversion from numbers to strings https://note.nkmk.me/python-pandas-str-num-conversion/ --Conversion from string type to date type https://note.nkmk.me/python-pandas-datetime-timestamp/ (first half) --Date type → character type conversion https://note.nkmk.me/python-datetime-usage/ (second half) --Retrieving elements with .apply https://note.nkmk.me/python-pandas-map-applymap-apply/

Recommended Posts

Data Science 100 Knock ~ Battle for less than beginners part3
Data Science 100 Knock ~ Battle for less than beginners part6
Data Science 100 Knock ~ Battle for less than beginners part2
Data Science 100 Knock ~ Battle for less than beginners part1
Data Science 100 Knock ~ Battle for less than beginners part9
Data Science 100 Knock ~ Battle for less than beginners part7
Data Science 100 Knock ~ Battle for less than beginners part4
Data Science 100 Knock ~ Battle for less than beginners part11
Data science 100 knocks ~ Battle for less than beginners part5
Data science 100 knocks ~ Battle for less than beginners part10
Data science 100 knocks ~ Battle for less than beginners part8
Data science 100 knock commentary (P061 ~ 080)
Data science 100 knock commentary (P041 ~ 060)
Data science 100 knock commentary (P081 ~ 100)
How to implement 100 data science knocks for data science beginners (for windows10 Home)
"Data Science 100 Knock (Structured Data Processing)" Python-007 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-006 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-001 Explanation
Time series data anomaly detection for beginners
"Data Science 100 Knock (Structured Data Processing)" Python-002 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 021 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-005 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-004 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 020 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 025 Explanation
"Data Science 100 Knock (Structured Data Processing)" Python-003 Explanation
[Python] Data Science 100 Knock (Structured Data Processing) 019 Explanation
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)
[Linux command] less command option list [Must-see for beginners]
For new students (Recommended efforts for Python beginners Part 1)
Preparing to try "Data Science 100 Knock (Structured Data Processing)"
Data science 100 knock (structured data processing) environment construction (Windows10)
Basics of pandas for beginners ② Understanding data overview