[PYTHON] Simple code that gives a score of 0.81339 in Kaggle's Titanic: Machine Learning from Disaster

If you move this, it should be a reasonable prediction accuracy. I'll brush it up a little more.

from sklearn.ensemble import RandomForestClassifier

#Importing data and checking the contents
train_data = pd.read_csv("../input/titanic/train.csv")
test_data = pd.read_csv("../input/titanic/test.csv")

#Handling of missing values
train_data['Age'].fillna(train_data['Age'].median(), inplace=True)
train_data['Embarked'].fillna(train_data['Embarked'].mode(), inplace=True)
test_data['Age'].fillna(test_data['Age'].median(), inplace=True)
test_data['Fare'].fillna(test_data.groupby('Pclass')['Fare'].median()[3], inplace=True)

#Data preparation
y_train = train_data["Survived"]
features = ["Pclass", "Sex", "SibSp", "Parch", 'Embarked']
X_train = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])

#Feature data engineering of training data
X_train['Young'] = np.where(train_data['Age'] < 15, 1, 0)
X_train['Old'] = np.where(train_data['Age'] >= 65, 1, 0)
X_train['Family'] = train_data['SibSp'] + train_data['Parch']
X_train['Alone'] = np.where(X_train['Family'] == 0, 1, 0)
X_train['Fare'] = (train_data['Fare'] - train_data['Fare'].min()) / (train_data['Fare'].max() - train_data['Fare'].min())

#Feature engineering of test data
X_test['Young'] = np.where(test_data['Age'] < 15, 1, 0)
X_test['Old'] = np.where(test_data['Age'] >= 65, 1, 0)
X_test['Family'] = test_data['SibSp'] + test_data['Parch']
X_test['Alone'] = np.where(X_test['Family'] == 0, 1, 0)
X_test['Fare'] = (test_data['Fare'] - test_data['Fare'].min()) / (test_data['Fare'].max() - test_data['Fare'].min())

#Modeling and fitting
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

#Saving data for submission
output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('my_submission.csv', index=False)
print("Your submission was successfully saved!")

Recommended Posts

Simple code that gives a score of 0.81339 in Kaggle's Titanic: Machine Learning from Disaster
[Machine learning] Understand from mathematics that standardization results in an average of 0 and a standard deviation of 1.
Get a glimpse of machine learning in Python
Installation of TensorFlow, a machine learning library from Google
An introduction to machine learning from a simple perceptron
MALSS, a tool that supports machine learning in Python
An example of a mechanism that returns a prediction by HTTP from the result of machine learning
Let Code Day45 Starting from Zero "1379. Find a Corresponding Node of a Binary Tree in a Clone of That Tree"
Code reading of faker, a library that generates test data in Python
Free version of DataRobot! ?? Introduction to "PyCaret", a library that automates machine learning
List of main probability distributions used in machine learning and statistics and code in python
[Django] Create a form that automatically fills in the address from the zip code
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning
A memorandum of method often used in machine learning using scikit-learn (for beginners)
Machine learning memo of a fledgling engineer Part 1
A story about simple machine learning using TensorFlow
Full disclosure of methods used in machine learning
List of links that machine learning beginners are learning
Overview of machine learning techniques learned from scikit-learn
Summary of evaluation functions used in machine learning
Machine learning memo of a fledgling engineer Part 2
Tool MALSS (application) that supports machine learning in Python
Machine learning python code summary (updated from time to time)
A collection of code often used in personal Python
About data preprocessing of systems that use machine learning
Tool MALSS (basic) that supports machine learning in Python
About testing in the implementation of machine learning models
# Function that returns the character code of a string
A shell program that becomes aho in multiples of 3
Super simple: A collection of shells that output dates
Create a machine learning environment from scratch with Winsows 10
Paper: Machine learning paper that reproduces images in the brain, (Deep image reconstruction from human brain activity)
Machine learning and statistical prediction, a paradigm of modern statistics that you should know before that