[PYTHON] Learning record No. 18 (22nd day)

Learning record (22nd day)

Start studying: Saturday, December 7th

Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): Completed on Thursday, December 19th ・ Progate Python course (5 courses in total): Ends on Saturday, December 21st ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): Completed on Saturday, December 23 ・ Kaggle: Real or Not? NLP with Disaster Tweets: Posted on Saturday, December 28th to Friday, January 3rd Adjustment ・ ** Wes Mckinney "(Japanese title) Introduction to Data Analysis with Python" (O'Reilly Japan, 2018) **: January 4th (Sat) ~

"Introduction to Data Analysis with Python"

p.181 Chapter 5 Finished reading until the introduction to pandas.

-Pandas is designed to handle tabular and non-uniform data. Like NumPy, I prefer data processing that doesn't use for loops. A lot of series and data frames are used.

-Series: Includes a label array called an index associated with consecutive values. Objects such as one-dimensional arrays, numerical references by label and condition specification are also possible You can also pass a Python dictionary format to make a series. If there is no corresponding one, it is treated as NaN. NaN can be identified by the isnull and not null functions of pandas.

-Data Frame: Has a tabular data structure and ordered columns. An image that shares the index of the series as a whole. Many of the processes used in Kaggle's pre-processing. Extraction of head, loc, colons designation, etc. (Because the extracted series has the same index as the data frame had.) Passing a nested dictionary interprets the outer key as the column index and the inner key as the row index.

-Index objects have the role of holding labels and metadata. Therefore, it is treated as immutable. This makes it possible to handle data safely. If you want to change the index, use pandas' reindex function. You can also index columns by specifying columns as an argument. The drop function that deletes an element can be changed while overwriting the original data by setting replace = True as an argument.

・ Data selection Use iloc to refer to the label of the data frame with loc and to refer to by index position. Slicing with labels is different from Python's and includes endpoints. (Including 2 in [: 2])

-When using arithmetic methods (add, sub, div ...), it is possible to calculate while considering NaN by using fill_value as the second argument. (Usually, where the axis labels do not overlap, one is ignored and the calculation is performed collectively as NaN.)

・ Summary statistics (number of each element, etc.) can also be output. sum etc. Also for columns by specifying axis = 1 (or axis ='columns') as an argument. For idxmax, the maximum value for each index. You can also get all the multiple summary statistics by passing describe. If it is numerical data, it is the deviation or total, if it is not numerical, it is the number of elements themselves excluding count or duplication. This was also often used in Kaggle. You can get the number of each element with value_count. You can also sort by combining with sort. value_count.sort () sort is true or false. The isin function can be used to determine if the specified element exists. True if there is. You can also use this to create a subset of just what you want.

Recommended Posts

Learning record No. 18 (22nd day)
Learning record No. 28 (32nd day)
Learning record No. 21 (25th day)
Learning record No. 10 (14th day)
Learning record No. 17 (21st day)
Learning record No. 24 (28th day)
Learning record No. 19 (23rd day)
Learning record No. 29 (33rd day)
Learning record No. 23 (27th day)
Learning record No. 25 (29th day)
Learning record No. 26 (30th day)
Learning record No. 20 (24th day)
Learning record No. 27 (31st day)
Learning record No. 14 (18th day) Kaggle4
Learning record No. 15 (19th day) Kaggle5
Learning record 4 (8th day)
Learning record 3 (7th day)
Learning record 5 (9th day)
Learning record 6 (10th day)
Programming learning record day 2
Learning record 8 (12th day)
Learning record 1 (4th day)
Learning record 7 (11th day)
Learning record 2 (6th day)
Learning record 16 (20th day)
Learning record 22 (26th day)
Learning record (2nd day) Scraping by #BeautifulSoup
Learning record 13 (17th day) Kaggle3
Learning record 12 (16th day) Kaggle2
Learning record
Learning record # 3
Learning record # 1
Learning record # 2
Learning record 11 (15th day) Kaggle participation
Python learning day 4
Learning record so far
Go language learning record
Linux learning record ① Plan
Effective Python Learning Memorandum Day 15 [15/100]
<Course> Deep Learning: Day2 CNN
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 12 [12/100]
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
Learning record (3rd day) #CSS selector description method #BeautifulSoup scraping
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Subjects> Deep Learning: Day3 RNN
Rabbit Challenge Deep Learning 2Day
Effective Python Learning Memorandum Day 13 [13/100]
Effective Python Learning Memorandum Day 3 [3/100]
Effective Python Learning Memorandum Day 5 [5/100]
Effective Python Learning Memorandum Day 4 [4/100]
Effective Python Learning Memorandum Day 7 [7/100]
Effective Python Learning Memorandum Day 2 [2/100]
Thoroughly study Deep Learning [DW Day 0]
Learning record (4th day) #How to get the absolute path from the relative path