[PYTHON] Day 66 [Introduction to Kaggle] The easiest Titanic forecast

I tried Kaggle's "Titanic: Machine Learning from Disaster".

Kaggle is like a fighting party that competes for machine learning skills. When I entered, there was content for beginners, so I will watch the guidance video immediately.

How to Get Started with Kaggle’s Titanic Competition | Kaggle

Super fast English! !! !! The contents included an overview of the Titanic accident, explanations of datasets, tutorials, and how to use Kaggle.

I can't hear it because it's too fast to hear, so the Japanese Wiki [Titanic wreck](https://ja.wikipedia.org/wiki/%E3%82%BF%E3%82%A4%E3%82%BF%E3 % 83% 8B% E3% 83% 83% E3% 82% AF% E5% 8F% B7% E6% B2% 88% E6% B2% A1% E4% BA% 8B% E6% 95% 85) Put.

Roughly summarized

・ Because it was an accident while I was sleeping at midnight, the initial action was delayed. ・ There were not enough life-saving tools. (It was thought to be safe) ・ Survival rates differ greatly between nobles and commoners, men and women, and age.

Looking at the figure, I think that the mortality rate is high in the area where the iceberg was hit and there was a hole.

A trailer that gives you a panoramic view of the ship. Although it is a movie, I think you can grasp the size of the ship, the number of people, and the atmosphere at that time. (These people are about to ...)

Titanic (dubbed version) --Trailer Titanic (dubbed version) --Trailer

Data used for forecasting

There were 891 for training and 418 for test data. The data definition is as follows:

variable Definition Remarks
Survived Whether it survived 0 = No, 1 = Yes
Pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
Name name
Sex sex
Age age
SibSp Number of brothers, sisters and spouses on board
Parch Number of parents / children on board
Ticket Ticket number
Fare Ticket price
Cabin Cabin number
embarked The port you boarded C = Cherbourg, Q = Queenstown, S = Southampton

Well programming!

A lot of example programs are posted on "Notebook", so check some popular ones.

There was also a Japanese tutorial. Kaggle Titanic First Step \ (1st Step for Kaggle Titanic )

I read it roughly and my head got messed up, so first I made a survival model for everyone to make the story easier. All you have to do is make a row of "Survived" and upload it to Kaggle.

Titanic All Survival Model ["Survived"] = 1

00.py


import pandas as pd

#Read CSV
test = pd.read_csv('test.csv')

#Added Survived column.
test["Survived"] = 1

#Verification
print(test["Survived"])

#Only PassengerId and Survived for submission.
test = test.loc[:,['PassengerId','Survived']]

#Output to CSV (no index required)
test.to_csv('titanic1-1.csv',index=False)

Check the resulting CSV and commit to Kaggle.

Public Score 0.37320 lederbord 15800th

The Public Score is close to the actual survival rate (31.9%). lederbord seems to be ranked by the person's highest score, and I didn't know the exact ranking, but 0.37320 was around 15800th. There are so many people in the world with the same score, that is, people who are thinking about the same thing ... this is a little ... I was impressed.

スクリーンショット 2020-01-19 19.08.26.png

The bottom was 0, and it was 70th from the bottom. A score of 0 means that all the correct answers are turned inside out, and this is the score you care about.

All death model

Upload CSV to Kaggle with [" Survived "] = 0. Since 1 --0.37320 = 0.6268, I expected the same value, but it was Public Score: 0.62679. It's almost right.

Male death, female survival model

This time, I will simply allocate it as death for men and survival for women. Titanic had a high mortality rate for men and a high survival rate for women, so this should still be predictive.

01.py


#use pandas
import pandas as pd

#Read CSV
test = pd.read_csv('test.csv')

#Added Survived column
test["Survived"] = 0

#1 for women(Survival)Replace with
test.loc[test["Sex"] == 'female', "Survived"] = 1

#Only PassengerId and Survived for submission.
test = test.loc[:,['PassengerId','Survived']]

#Output to CSV (no index required)
test.to_csv('titanic1.csv',index=False)

Public Score:0.76555 lederbord: 12457th place / about 15,000 people?

It seems that the contents are the same as the CSV of Gender Based Model. スクリーンショット 2020-01-19 19.41.28.png

Even a very simple model is 0.76555, so how to improve the accuracy of prediction from here is a showcase of skill.

First of all, it is about checking the rules.

Recommended Posts

Day 66 [Introduction to Kaggle] The easiest Titanic forecast
[Introduction to Python3 Day 20] Chapter 9 Unraveling the Web (9.1-9.4)
Kaggle Tutorial Titanic know-how to be in the top 2%
Day 67 [Introduction to Kaggle] Have you tried using Random Forest?
Day 68 [Introduction to Kaggle] Random Forest was a simple one.
[Introduction to Python3 Day 1] Programming and Python
[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)
[Day3] Preparing to connect to the database
[Introduction to Python3 Day 15] Chapter 7 Strings (7.1.2-7.1.2.2)
The easiest way to make Flask
The easiest way to try PyQtGraph
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
Approach commentary for beginners to be in the top 1.5% (0.83732) of Kaggle Titanic_3
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
Challenges for the Titanic Competition for Kaggle Beginners
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
Probably the most straightforward introduction to TensorFlow
Introduction to discord.py (1st day) -Preparation for discord.py-
Approach commentary for beginners to be in the top 1.5% (0.83732) of Kaggle Titanic_1
Approach commentary for beginners to be in the top 1.5% (0.83732) of Kaggle Titanic_2
Kaggle: Introduction to Manual Feature Engineering Part 1
It's okay to stumble on Titanic! Introducing the Kaggle strategy for super beginners
The easiest way to get started with Django
[Introduction to Python3 Day 12] Chapter 6 Objects and Classes (6.3-6.15)
[Introduction to AWS] The first Lambda is Transcribe ♪
The easiest way to synthesize speech with python
[Introduction to Python3 Day 22] Chapter 11 Concurrency and Networking (11.1 to 11.3)
[Introduction to Python3 Day 23] Chapter 12 Become a Paisonista (12.1 to 12.6)
Introduction to Python with Atom (on the way)
[Introduction to Algorithm] Find the shortest path [Python3]
Introduction to MQTT (Introduction)
Introduction to Scrapy (1)
Introduction to Scrapy (3)
Introduction to Supervisor
Introduction to Tkinter 1: Introduction
Introduction to PyQt
Introduction to Scrapy (2)
Challenge Kaggle Titanic
[Linux] Introduction to Linux
Introduction to Scrapy (4)
Introduction to discord.py (2)
Introduction to discord.py
[Introduction to Python3 Day 8] Chapter 4 Py Skin: Code Structure (4.1-4.13)
The easiest way to set up Last-Modified in Flask
A quick introduction to the neural machine translation library
[Introduction to Python] How to iterate with the range function?
[Introduction to Udemy Python3 + Application] 30. How to use the set
[Introduction to Python] How to stop the loop using break?
The easiest line bot in the world to lose weight
[Introduction to Python] Basic usage of the library matplotlib
Take a closer look at the Kaggle / Titanic tutorial