[PYTHON] Day 68 [Introduction to Kaggle] Random Forest was a simple one.

Last time, when I predicted in a random forest using only gender as a model, all the men died and all the women survived, which was a great result. Day 67 [Introduction to Kaggle] Have you tried using Random Forest?

Who is Random Forest?

I tried various experiments.

Gender & class model creation

Add P class to the model only for the gender you created last time.

21.py


(Omitted)
#Create a Dataframe
#Gender, grade
train_df = train_df.loc[:,['PassengerId','Survived','Sex','Pclass']]
test_df = test_df.loc[:,['PassengerId','Sex','Pclass']]
(same as above)

As a result, Public Score: 0.75598 ... decreased. [Wikipedia of Titanic \ (passenger ship )](https://ja.wikipedia.org/wiki/%E3%82%BF%E3%82%A4%E3%82%BF%E3%83%8B%E3% As far as 83% 83% E3% 82% AF_ (% E5% AE% A2% E8% 88% B9)) is concerned, it is strange that the mortality rate should vary greatly from grade to grade.

Of the survivors and deaths, most of the passengers who used the third cabin were dead. The third-class cabin is divided into front and rear at the bottom, and when it sinks, the passengers in the front cabin either go straight up or cut through the hull and move backward to escape. Then there were two ways to go straight up. However, there is a theory that the door of the former was locked because there was a first-class cabin directly above it, and the only method of the latter was the cause of the increase in deaths.

(Memo) If the third cabin is divided into a front cabin and a rear cabin, it seems that we can make a prediction.

Check the model only for the class

22.py


(Omitted)
#Create a Dataframe
#grade
train_df = train_df.loc[:,['PassengerId','Survived','Pclass']]
test_df = test_df.loc[:,['PassengerId','Pclass']]
(same as above)

Public Score:0.65550

It went down further. Perhaps, like the gender-only model, the grade-only model is grouped into 0 or 1. I will check it.

Training data

23.py


print(train_df.groupby(['Pclass','Survived']).count())

                 PassengerId
Pclass Survived             
1      0                  80
       1                 136
2      0                  97
       1                  87
3      0                 372
       1                 119

test data

24.py


##Predicted results for confirmation(submission)Add a class to.
submission['Pclass'] = test_df['Pclass'] 
print(submission.groupby(['Pclass','Survived']).count())
                 PassengerId
Pclass Survived             
1      1                 107
2      0                  93
3      0                 218

In the training data, there were 0s and 1s in the class Test data (prediction results) are summarized as 0 or 1.

Random forest seems to put together predictions for those with more training data.

Check gender & class model

Training data

25.py


print(train_df.groupby(['Sex','Pclass','Survived']).count())
                    PassengerId
Sex Pclass Survived             
0   1      0                  77
           1                  45
    2      0                  91
           1                  17
    3      0                 300
           1                  47
1   1      0                   3
           1                  91
    2      0                   6
           1                  70
    3      0                  72
           1                  72

test data

26.py


#Predicted results for confirmation(submission)Add gender and class to.
submission['Sex'] = test_df['Sex'] 
submission['Pclass'] = test_df['Pclass'] 
print(submission.groupby(['Sex','Pclass','Survived']).count())
                     PassengerId
Sex Pclass Survived             
0   1      0                  57
    2      0                  63
    3      0                 146
1   1      1                  50
    2      1                  30
    3      0                  72

The variability that was in the training data is summarized in the test data. Random forest seems to be put together in the larger one.

Recommended Posts

Day 68 [Introduction to Kaggle] Random Forest was a simple one.
Day 67 [Introduction to Kaggle] Have you tried using Random Forest?
[Introduction to Python3 Day 23] Chapter 12 Become a Paisonista (12.1 to 12.6)
Day 66 [Introduction to Kaggle] The easiest Titanic forecast
An introduction to machine learning from a simple perceptron
I can't sleep until I build a server !! (Introduction to Python server made in one day)
A quick introduction to pytest-mock
A super introduction to Linux
[Introduction to Python3 Day 1] Programming and Python
[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)
[Introduction to Python3 Day 15] Chapter 7 Strings (7.1.2-7.1.2.2)
A light introduction to object detection
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
Django TemplateDoesNotExist at / is out! (Simple but addicted to half a day!)
Want to solve a simple classification problem?
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
A super introduction to Python bit operations
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
A simple IDAPython script to name a function
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
Introduction to discord.py (1st day) -Preparation for discord.py-
An Introduction to Object-Oriented-Give an object a child.
Kaggle: Introduction to Manual Feature Engineering Part 1
A machine learning beginner tried to create a sheltie judgment AI in one day