I tried Kaggle's "Titanic: Machine Learning from Disaster".

Kaggle is like a fighting party that competes for machine learning skills. When I entered, there was content for beginners, so I will watch the guidance video immediately.

Super fast English! !! !! The contents included an overview of the Titanic accident, explanations of datasets, tutorials, and how to use Kaggle.

I can't hear it because it's too fast to hear, so the Japanese Wiki [Titanic wreck](https://ja.wikipedia.org/wiki/%E3%82%BF%E3%82%A4%E3%82%BF%E3 % 83% 8B% E3% 83% 83% E3% 82% AF% E5% 8F% B7% E6% B2% 88% E6% B2% A1% E4% BA% 8B% E6% 95% 85) Put.

Roughly summarized

・ Because it was an accident while I was sleeping at midnight, the initial action was delayed. ・ There were not enough life-saving tools. (It was thought to be safe) ・ Survival rates differ greatly between nobles and commoners, men and women, and age.

Looking at the figure, I think that the mortality rate is high in the area where the iceberg was hit and there was a hole.

A trailer that gives you a panoramic view of the ship. Although it is a movie, I think you can grasp the size of the ship, the number of people, and the atmosphere at that time. (These people are about to ...)

Titanic (dubbed version) --Trailer

Data used for forecasting

There were 891 for training and 418 for test data. The data definition is as follows:

variable	Definition	Remarks
Survived	Whether it survived	0 = No, 1 = Yes
Pclass	Ticket class	1 = 1st, 2 = 2nd, 3 = 3rd
Name	name
Sex	sex
Age	age
SibSp	Number of brothers, sisters and spouses on board
Parch	Number of parents / children on board
Ticket	Ticket number
Fare	Ticket price
Cabin	Cabin number
embarked	The port you boarded	C = Cherbourg, Q = Queenstown, S = Southampton

Well programming!

A lot of example programs are posted on "Notebook", so check some popular ones.

There was also a Japanese tutorial. Kaggle Titanic First Step \ (1st Step for Kaggle Titanic )

I read it roughly and my head got messed up, so first I made a survival model for everyone to make the story easier. All you have to do is make a row of "Survived" and upload it to Kaggle.

Titanic All Survival Model ["Survived"] = 1

`00.py`


import pandas as pd

#Read CSV
test = pd.read_csv('test.csv')

#Added Survived column.
test["Survived"] = 1

#Verification
print(test["Survived"])

#Only PassengerId and Survived for submission.
test = test.loc[:,['PassengerId','Survived']]

#Output to CSV (no index required)
test.to_csv('titanic1-1.csv',index=False)

Check the resulting CSV and commit to Kaggle.

Public Score 0.37320 lederbord 15800th

The Public Score is close to the actual survival rate (31.9%). lederbord seems to be ranked by the person's highest score, and I didn't know the exact ranking, but 0.37320 was around 15800th. There are so many people in the world with the same score, that is, people who are thinking about the same thing ... this is a little ... I was impressed.

スクリーンショット 2020-01-19 19.08.26.png

The bottom was 0, and it was 70th from the bottom. A score of 0 means that all the correct answers are turned inside out, and this is the score you care about.

All death model

Upload CSV to Kaggle with [" Survived "] = 0. Since 1 --0.37320 = 0.6268, I expected the same value, but it was Public Score: 0.62679. It's almost right.

Male death, female survival model

This time, I will simply allocate it as death for men and survival for women. Titanic had a high mortality rate for men and a high survival rate for women, so this should still be predictive.

`01.py`


#use pandas
import pandas as pd

#Read CSV
test = pd.read_csv('test.csv')

#Added Survived column
test["Survived"] = 0

#1 for women(Survival)Replace with
test.loc[test["Sex"] == 'female', "Survived"] = 1

#Only PassengerId and Survived for submission.
test = test.loc[:,['PassengerId','Survived']]

#Output to CSV (no index required)
test.to_csv('titanic1.csv',index=False)

Public Score:0.76555 lederbord: 12457th place / about 15,000 people?

It seems that the contents are the same as the CSV of Gender Based Model. スクリーンショット 2020-01-19 19.41.28.png

Even a very simple model is 0.76555, so how to improve the accuracy of prediction from here is a showcase of skill.

First of all, it is about checking the rules.

[PYTHON] Day 66 [Introduction to Kaggle] The easiest Titanic forecast