100 Pandas knocks for Python beginners

Introduction

We have created ** "Pandas 100 Knock for Python Beginners" ** as content to efficiently learn the Python library Pandas, so we will publish it. This content is also in line with the content of the ** Python3 engineer certification data analysis test, so performing these 100 knocks will also be a qualification measure. ** Also, at the end of the knock, there is a survival prediction problem for Titanic passengers, which is also a practice for participating in machine learning competitions such as Kaggle.

100_knocks.png

Motivation for creation

Overview of 100 Pandas knocks

IMAGE ALT TEXT HERE

the issue's details

No. Classification problem
1 Basics Display the first 5 lines of data read into df
2 Basics Display the last 5 lines of data read into df
3 Basics Check the DataFrame size of df
4 Basics data1 in the input folder.Read csv file, store in df2, display first 5 lines
5 Basics Sorted and displayed in ascending order in the fare column of df
6 Basics df_Copy df to copy to see the first 5 lines
7 Basics ① Check the data type of each column of df
② Check the data type of the cabin column of df
8 Basics ① Check the data type of the pclass column of df with dtype
(2) Convert from numeric type to character type and check the data type with dtype
9 Basics Number of records in df(Number of lines)confirm
10 Basics Number of records in df(Number of lines), Check the data type of each column and the presence or absence of missing values
11 Basics df sex,Check the elements of the cabin column
12 Basics Display df column name list in list format
13 Basics Display df index list in ndarray format
14 Extraction Show only column of df name
15 Extraction Show only df name and sex columns
16 Extraction df index(line)の4line目までを表示
17 Extraction df index(line)の4line目から10line目までを表示
18 Extraction View entire df using loc
19 Extraction Show all df fare columns using loc
20 Extraction Use loc to display up to the 10th row of the df fare column
21 Extraction Show all df name and ticket columns using loc
22 Extraction Use loc to show all columns from df name to cabin
23 Extraction Display df age column up to 5th row using iloc
24 Extraction df name,age,sexの列のみExtractionしdf2に格納
Then output as a csv file to the output folder
25 Extraction dfのage列の値が30以上のデータのみExtraction
26 Extraction dfのsex列がfemaleのデータのみExtraction
27 Extraction dfのsex列がfemaleでかつageが40以上のデータのみExtraction
28 Extraction queryを用いてdfのsex列がfemaleでかつageが40以上のデータのみExtraction
29 Extraction Display data containing the character string "Mrs" in the name column of df
30 Extraction Show only character type columns in df
31 Extraction Counting the number of unique elements in each column of df
32 Extraction Check the elements of the embarked column of df and the number of occurrences
33 processing Changed age column of df index name "3" from 30 to 40
34 processing Change male → 0, femlae → 1 in the sex column of df and display the first 5 rows
35 processing Add 100 to the fare column of df to display the first 5 rows
36 processing Multiply the fare column of df by 2 to display the first 5 rows
37 processing Round the fare column of df after the decimal point
38 processing Add a column with column name "test" and all 1 values to df and display the first 5 rows
39 processing Add the cabin and embarked columns to df_Add columns joined by(Column name is "test")And display the first 5 lines
40 processing Add the age and embarked columns to df_Add columns joined by(Column name is "test")And display the first 5 lines
41 processing Remove the body column from df and show the first 5 rows
42 processing Remove the line with index name "3" from df and display the first 5 lines
43 processing The column name of df2'name', 'class', 'Biology', 'Physics', 'Chemistry'change to
Show first 5 lines of df2
44 processing The column name of df2'English'Biology'change to
Show first 5 lines of df2
45 processing Changed index name "1" of df2 to "10"
Show first 5 lines of df2
46 processing Check the number of missing values in all columns of df
47 processing Substitute 30 for the missing value in the df age column
After that, check the number of missing values of age
48 processing Delete lines with even one missing value with df
After that, check the number of missing values in df
49 processing df survived column in array format(Array)Display with
50 processing Shuffle and display df lines
51 processing Shuffle the df line and reindex to display
52 processing ① Count the number of duplicate lines in df2
53 processing Convert the name column of df to all uppercase and display
54 processing Convert all df name columns to lowercase and display
55 processing The word "female" in the sex column of df
Replaced with "Python"
56 processing "Allen" in the first row of the name column of df, Miss.Elisabeth Walton "
Erase "Elisabeth"(need import re)
57 processing Make sure there are no spaces in the prefecture and city columns of df5
「_Combine with(New column name is "test2")And display the first 5 lines
58 processing Swap rows and columns in df2
59 Merge and concatenate Left join df3 to df2 and store in df2
60 Merge and concatenate Right-join df3 to df2 and store in df2
61 Merge and concatenate Innerly join df3 to df2 and store in df2
62 Merge and concatenate Outer join df3 to df2 and store in df2
63 Merge and concatenate Concatenate df2 and df4 in the column direction and store in df2
64 Merge and concatenate df2 and df4 are connected in the column direction and overlap
Delete one of the name columns and store it in df2
65 Merge and concatenate df2 and df2 are connected in the row direction and overlap
Delete one of the name columns and store it in df2
66 statistics Check the average value of the age column of df
67 statistics Check the median of the age column of df
68 statistics ① Total score for each student of df2 (total in row direction)
(2) Sum of points for each df2 subject (total in the column direction)
69 statistics Maximum score in English for df2
70 statistics Minimum score in English for df2
71 statistics Group by class in df2 and find the maximum, minimum, and average values of the subjects for each class.(Delete the name column)
72 statistics dfの基本statistics量を確認(describe)
73 statistics Between each column of df(Pearson)Check the correlation coefficient
74 statistics scikit-Use learn to standardize df2's English, Mathmatics, and History
75 statistics scikit-Standardize the English column of df2 using learn
76 statistics scikit-Min the English, Mathmatics, and History columns of df2 using learn-Max scale
77 statistics Get the row name of the maximum and minimum values of the fare column of df
78 statistics Get the 0th, 25th, 50th, 75th and 100th percentiles of the df fare column
79 statistics ① Get the mode of the age column of df
②value_counts()Check the number of elements in the age column at, and confirm the validity of the result of ①.
80 labeling Label encode the sex column of df and display the first 5 rows of df
81 labeling One sex column for df-hot encode and display the first 5 lines of df
82 Pandas plot Show histogram of all numeric columns in df
83 Pandas plot Display the age column of df as a histogram
84 Pandas plot Display the total score of 3 subjects for each name of df2 in a bar graph
85 Pandas plot Display 3 subjects for each element of the name column of df2 side by side in a bar graph
86 Pandas plot Display 3 subjects for each element in the name column of df2 as a stacked bar graph
87 Pandas plot Show scatter plot between each column of df
88 Pandas plot Create a scatter plot with the age and fare columns of df
89 Pandas plot In the graph drawn in [88], "age"-fare scatter "
Give a graph title
90 Titanic Survivor Prediction df_Label encoding sex and embarked columns of copy
91 Titanic Survivor Prediction df_Check for missing values in copy
92 Titanic Survivor Prediction df_Complement the missing values in the age and fare columns of copy with the average value of each column
93 Titanic Survivor Prediction df_Delete unnecessary lines that are not used in machine learning in copy
94 Titanic Survivor Prediction ①df_Extract pclass, age, sex, fare, embarked columns of copy and convert to ndarray format
②df_Extract the survived column of copy and convert it to ndarray format
95 Titanic Survivor Prediction Divide the features and target created in [94] into training data and test data.
96 Titanic Survivor Prediction Training data(features、target)Perform learning in a random forest using
97 Titanic Survivor Prediction test_X Data Predict Passenger Survival
98 Titanic Survivor Prediction Prediction result is test_y(Answer of survival)And how much
Check if it was consistent(Evaluation index is accuracy)
99 Titanic Survivor Prediction Each column in learning(Feature value)Show importance of
100 Titanic Survivor Prediction test_Output the prediction result of X to the output folder with csv (file name is "submission".csv」)

How to Use

Directory structure

pandas_100_knocks_v1.0 ├ notebook /… Stores 3 ipynb files ├ input /… Contains answer files for 100 questions and datasets used for questions └ output /… Stored here when outputting a file due to a problem

Aim of this content

Hopefully, Python beginners can reach level 3 and set the problem (I think you can reach level 2 if you solve it 3 times).

download

The content can be downloaded from GitHub.

https://github.com/kunishou/Pandas_100_knocks

Scope of use / Precautions

Other (Scratchpad)

Scratchpad of nbextensions is convenient as an extension of Jupyter Notebook, so we recommend installing it. While working on 100 knocks, it is troublesome to do "Add new cell → df.head ()" to check the data contents stored in the data frame. With Scratchpad, you can call up a disposable cell area with "Ctrl + B".

scratchpad.png

Please refer to the following for the installation method.

[Python] jupyter notebook extensions ~

Finally

If you have any questions or requests regarding this content, please contact us.

Recommended Posts

100 Pandas knocks for Python beginners
python textbook for beginners
OpenCV for Python beginners
Learning flow for Python beginners
Python3 environment construction (for beginners)
Python #function 2 for super beginners
Basic Python grammar for beginners
Python for super beginners Python #functions 1
Python #list for super beginners
~ Tips for beginners to Python ③ ~
Summary of pre-processing practices for Python beginners (Pandas dataframe)
Pandas basics for beginners ① Reading & processing
Pandas basics for beginners ⑧ Digit processing
Python for super beginners Python # dictionary type 1 for super beginners
<For beginners> python library <For machine learning>
Python #len function for super beginners
Run unittests in Python (for beginners)
Beginners use Python for web scraping (4) ―― 1
Python #Hello World for super beginners
Python for super beginners Python # dictionary type 2 for super beginners
Pandas basics summary link for beginners
2016-10-30 else for Python3> for:
python [for myself]
[Python] Challenge 100 knocks! (015 ~ 019)
My pandas (python)
[Python] Minutes of study meeting for beginners (7/15)
[Python] Challenge 100 knocks! (030-034)
Let's put together Python for super beginners
[Python] Challenge 100 knocks! (010-014)
[Pandas] I tried to analyze sales data with Python [For beginners]
Pandas of the beginner, by the beginner, for the beginner [Python]
[Python] Read images with OpenCV (for beginners)
[Python] Challenge 100 knocks! (025-029)
WebApi creation with Python (CRUD creation) For beginners
Beginners practice Python
Python beginner's note
Atcoder standard input set for beginners (python)
python pandas notes
[For beginners] Try web scraping with Python
A textbook for beginners made by Python beginners
[Python] Challenge 100 knocks! (020-024)
Memo # 4 for Python beginners to read "Detailed Python Grammar"
Pandas basics for beginners ③ Histogram creation with matplotlib
The fastest way for beginners to master Python
Python for super beginners Python for super beginners # Easy to get angry
Causal reasoning and causal search with Python (for beginners)
Python Pandas is not suitable for batch processing
For new students (Recommended efforts for Python beginners Part 1)
Python pandas: Search for DataFrame using regular expressions
Memo # 3 for Python beginners to read "Detailed Python Grammar"
Memo # 1 for Python beginners to read "Detailed Python Grammar"
~ Tips for Python beginners from Pythonista with love ① ~
Try to calculate RPN in Python (for beginners)
Easy understanding of Python for & arrays (for super beginners)
Memo # 2 for Python beginners to read "Detailed Python Grammar"
Memo # 7 for Python beginners to read "Detailed Python Grammar"
Introduction to Programming (Python) TA Tendency for beginners
Installing TensorFlow on Windows Easy for Python beginners
Memo # 6 for Python beginners to read "Detailed Python Grammar"
How to make Python faster for beginners [numpy]
~ Tips for Python beginners from Pythonista with love ② ~