[PYTHON] Simple code that gives a score of 0.81339 in Kaggle's Titanic: Machine Learning from Disaster

If you move this, it should be a reasonable prediction accuracy. I'll brush it up a little more.

from sklearn.ensemble import RandomForestClassifier

#Importing data and checking the contents
train_data = pd.read_csv("../input/titanic/train.csv")
test_data = pd.read_csv("../input/titanic/test.csv")

#Handling of missing values
train_data['Age'].fillna(train_data['Age'].median(), inplace=True)
train_data['Embarked'].fillna(train_data['Embarked'].mode(), inplace=True)
test_data['Age'].fillna(test_data['Age'].median(), inplace=True)
test_data['Fare'].fillna(test_data.groupby('Pclass')['Fare'].median()[3], inplace=True)

#Data preparation
y_train = train_data["Survived"]
features = ["Pclass", "Sex", "SibSp", "Parch", 'Embarked']
X_train = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])

#Feature data engineering of training data
X_train['Young'] = np.where(train_data['Age'] < 15, 1, 0)
X_train['Old'] = np.where(train_data['Age'] >= 65, 1, 0)
X_train['Family'] = train_data['SibSp'] + train_data['Parch']
X_train['Alone'] = np.where(X_train['Family'] == 0, 1, 0)
X_train['Fare'] = (train_data['Fare'] - train_data['Fare'].min()) / (train_data['Fare'].max() - train_data['Fare'].min())

#Feature engineering of test data
X_test['Young'] = np.where(test_data['Age'] < 15, 1, 0)
X_test['Old'] = np.where(test_data['Age'] >= 65, 1, 0)
X_test['Family'] = test_data['SibSp'] + test_data['Parch']
X_test['Alone'] = np.where(X_test['Family'] == 0, 1, 0)
X_test['Fare'] = (test_data['Fare'] - test_data['Fare'].min()) / (test_data['Fare'].max() - test_data['Fare'].min())

#Modeling and fitting
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

#Saving data for submission
output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('my_submission.csv', index=False)
print("Your submission was successfully saved!")

Recommended Posts

Simple code that gives a score of 0.81339 in Kaggle's Titanic: Machine Learning from Disaster

[Machine learning] Understand from mathematics that standardization results in an average of 0 and a standard deviation of 1.

Get a glimpse of machine learning in Python

Installation of TensorFlow, a machine learning library from Google

An introduction to machine learning from a simple perceptron

MALSS, a tool that supports machine learning in Python

An example of a mechanism that returns a prediction by HTTP from the result of machine learning

Let Code Day45 Starting from Zero "1379. Find a Corresponding Node of a Binary Tree in a Clone of That Tree"

Code reading of faker, a library that generates test data in Python

Free version of DataRobot! ?? Introduction to "PyCaret", a library that automates machine learning

List of main probability distributions used in machine learning and statistics and code in python