Last time, when I predicted in a random forest using only gender as a model, all the men died and all the women survived, which was a great result. Day 67 [Introduction to Kaggle] Have you tried using Random Forest?

Who is Random Forest?

I tried various experiments.

Gender & class model creation

Add P class to the model only for the gender you created last time.

`21.py`


(Omitted)
#Create a Dataframe
#Gender, grade
train_df = train_df.loc[:,['PassengerId','Survived','Sex','Pclass']]
test_df = test_df.loc[:,['PassengerId','Sex','Pclass']]
(same as above)

As a result, Public Score: 0.75598 ... decreased. [Wikipedia of Titanic \ (passenger ship )](https://ja.wikipedia.org/wiki/%E3%82%BF%E3%82%A4%E3%82%BF%E3%83%8B%E3% As far as 83% 83% E3% 82% AF_ (% E5% AE% A2% E8% 88% B9)) is concerned, it is strange that the mortality rate should vary greatly from grade to grade.

Of the survivors and deaths, most of the passengers who used the third cabin were dead. The third-class cabin is divided into front and rear at the bottom, and when it sinks, the passengers in the front cabin either go straight up or cut through the hull and move backward to escape. Then there were two ways to go straight up. However, there is a theory that the door of the former was locked because there was a first-class cabin directly above it, and the only method of the latter was the cause of the increase in deaths.

(Memo) If the third cabin is divided into a front cabin and a rear cabin, it seems that we can make a prediction.

Check the model only for the class

`22.py`


(Omitted)
#Create a Dataframe
#grade
train_df = train_df.loc[:,['PassengerId','Survived','Pclass']]
test_df = test_df.loc[:,['PassengerId','Pclass']]
(same as above)

Public Score:0.65550

It went down further. Perhaps, like the gender-only model, the grade-only model is grouped into 0 or 1. I will check it.

Training data

`23.py`


print(train_df.groupby(['Pclass','Survived']).count())

                 PassengerId
Pclass Survived             
1      0                  80
       1                 136
2      0                  97
       1                  87
3      0                 372
       1                 119

test data

`24.py`


##Predicted results for confirmation(submission)Add a class to.
submission['Pclass'] = test_df['Pclass'] 
print(submission.groupby(['Pclass','Survived']).count())
                 PassengerId
Pclass Survived             
1      1                 107
2      0                  93
3      0                 218

In the training data, there were 0s and 1s in the class Test data (prediction results) are summarized as 0 or 1.

Random forest seems to put together predictions for those with more training data.

Check gender & class model

Training data

`25.py`


print(train_df.groupby(['Sex','Pclass','Survived']).count())
                    PassengerId
Sex Pclass Survived             
0   1      0                  77
           1                  45
    2      0                  91
           1                  17
    3      0                 300
           1                  47
1   1      0                   3
           1                  91
    2      0                   6
           1                  70
    3      0                  72
           1                  72

test data

`26.py`


#Predicted results for confirmation(submission)Add gender and class to.
submission['Sex'] = test_df['Sex'] 
submission['Pclass'] = test_df['Pclass'] 
print(submission.groupby(['Sex','Pclass','Survived']).count())
                     PassengerId
Sex Pclass Survived             
0   1      0                  57
    2      0                  63
    3      0                 146
1   1      1                  50
    2      1                  30
    3      0                  72

The variability that was in the training data is summarized in the test data. Random forest seems to be put together in the larger one.

[PYTHON] Day 68 [Introduction to Kaggle] Random Forest was a simple one.

Who is Random Forest?

Gender & class model creation

21.py

Check the model only for the class

22.py

Training data

23.py

test data

24.py

Random forest seems to put together predictions for those with more training data.

Check gender & class model

Training data

25.py

test data

26.py

`21.py`

`22.py`

`23.py`

`24.py`

`25.py`

`26.py`