Create a survival prediction model for Kaggle Titanic passengers without using Python

Summary and results of this article

I will challenge the Kaggle Titanic competition using the AutoML? Called VARISTA that I recently learned. The score was 0.80861.

Register with Kaggle

If you haven't registered with Kaggle, register with Kaggle. Please register from the upper right of the screen. image.png

Data preparation

This competition is "Titanic: Machine Learning from Disaster" from here. Go to the competition and select the "Data" tab. You can also go to the data page by clicking here. When you reach the data screen, select Download All. image.png

When the download is complete, you will find "titanic.zip", so unzip this file. After unzipping, you can see the following files.

image.png

The usage of each file is as follows.

file name Use
train.csv Teacher data
test.csv test data
gender_submission.csv Sample data for posting

** Data variable description **

Column name Japanese
PassengerID Passenger ID
Survived Survival result(1:Survival, 0:death)
Pclass Room class 1=Upper, 2=Middle, 3=Lower
Name name
Sex sex
Age age
SibSp Number of siblings and spouse
Parch Number of parents and children
Ticket Ticket number
Fare Boarding fee
Cabin room number
Embarked Three types of ports on board: Cherbourg, Queenstown, and Southampton

Register with VARISTA

Create a VARISTA account. Go to http://www.varista.ai and register from the top page. By the way, if you register from this account, it will be a credit that can be used in the service, so if you like, I would be happy if you could fly from this link. .. If you don't like it, you can fly from ↑ at all. .. I did not know··. https://console.varista.ai/welcome/jamaica-draft-coach-cup-blend

There seems to be a paid plan, but for the time being, I tried it for free.

image.png

Project creation and data confirmation

After logging in to VARISTA, create a workspace with any name. After creating the workspace, create the project. I think the name may be Titanic.

Follow the guide to upload the data. スクリーンショット 2020-05-26 23.02.41.png

The data to be uploaded is the teacher data "train.csv". スクリーンショット 2020-05-26 23.03.29.png

When the upload is complete, select the column you want to predict. In this competition, we will select "Survived" because we want to predict the survival of passengers. スクリーンショット 2020-05-26 23.04.14.png

Select START for which the settings have been completed to move to the next screen. スクリーンショット 2020-05-26 23.04.38.png

Once you've selected your target, you're ready to go.

Data confirmation

You can start learning suddenly here, but since it's a big deal, let's take a look at the contents of the data. Select the data menu and select the "train.csv" you uploaded earlier. スクリーンショット 2020-05-26 23.11.40.png

If you look at the data deficiencies, you can see that there are deficiencies in the age and cabin data. However, in the case of VARISTA, it seems that missing data is automatically supplemented.

image.png

Let's look at the distribution of the data. If you select "Visualize" from the tab, the distribution will be displayed for each data in the feature column, which is convenient. If you select the Correlation tab, you can see the correlation between the columns you want to predict and each column.

スクリーンショット 2020-05-26 23.53.13.png

** Gender, age ** Try replacing 0 with death and 1 with survival. Gender has a lot to do with it, and women seem to be alive. As for age, the survival rate is generally high under 7 years old, and the mortality rate seems to be high after 60 years old. There seems to be no big difference in the middle. The child seems to have been rescued preferentially image.png

PClass The higher the grade, the higher the survival rate. image.png

Learning

Let's actually learn. Select the AI model on the left and click "Create AI Model". Then make sure the column you want to predict is "Survived" and click the ** Start Learning ** button. スクリーンショット 2020-05-26 23.05.21.png

Learning will start automatically without any settings, especially on this side, which is popular these days. It seems that feature engineering is performed and learning is performed using multiple algorithms.

Learning results

It has a score of 70. Looking at the degree of influence, it seems that gender and Pclass are related to survival. スクリーンショット 2020-06-03 21.39.23.png

Submission to Kaggle

Click ** Predict with this model ** on the ↑ screen. Click here to change the output format. image.png

Set the columns that are not output. スクリーンショット 2020-05-28 8.51.57.png

Then change the format of the output columns to flags. スクリーンショット 2020-05-28 8.52.18.png

Finally, drag and drop test.csv from the file you downloaded earlier. image.png

Download the completed file. image.png

When you open the file, you can see that the rightmost column contains the prediction of survival. Delete any columns you don't need to post to Kaggle. This time I removed it with Numbers on Mac, but I think Excel etc. is good for Windows. スクリーンショット 2020-06-03 21.20.22.png

Select "Submit Predictions" from the Kaggle competition screen and drag and drop the file you downloaded earlier. スクリーンショット 2020-05-28 8.30.37.png

Finally, press Make Submission to post. After a while, it will be scored and the score will be output.

スクリーンショット 2020-06-03 21.17.46.png

The score this time was 0.77511.

Adjust the model by changing the learning settings

I changed the learning level, the percentage of validation data, the number of cross-validation divisions, and the random seed value from the learning settings, and the score improved, so I will post it.

Click the setting button at the top right of the model learning start screen. スクリーンショット 2020-06-03 8.09.21.png

I tried to make the value like this. I haven't tried it so much, so there may be better settings, but I'll try it later.

image.png

image.png

Now let's learn again and submit to Kaggle again.

スクリーンショット 2020-06-03 20.13.18.png

The score went up to 0.80861. It takes about 30 minutes to study level 3, so I would like to try various things and write more.

Recommended Posts

Create a survival prediction model for Kaggle Titanic passengers without using Python
Let's set up a survival prediction model for Titanic passengers
Create a python GUI using tkinter
Create a model for your Django schedule
Let's create a virtual environment for Python
[Python] Create a Batch environment using AWS-CDK
(Kaggle) Predicted Titanic survivors using a model using decision trees and random forests
Create an elliptical scatter plot in Python without using a multivariate normal distribution
Create a LINE BOT with Minette for Python
Create a web map using Python and GDAL
Create a simple momentum investment model in Python
Create a Mac app using py2app and Python3! !!
Create a MIDI file in Python using pretty_midi
Let's make a module for Python using SWIG
[Python] Implementation of clustering using a mixed Gaussian model
Create a data collection bot in Python using Selenium
[CRUD] [Django] Create a CRUD site using the Python framework Django ~ 1 ~
[Python] 2 Create a risk-return map for your asset portfolio
[Python] Create a ValueObject with a complete constructor using dataclasses
Create a Layer for AWS Lambda Python with Docker
[CRUD] [Django] Create a CRUD site using the Python framework Django ~ 2 ~
Memo for building a machine learning environment using Python
Create a company name extractor with python using JCLdic
[CRUD] [Django] Create a CRUD site using the Python framework Django ~ 3 ~
[CRUD] [Django] Create a CRUD site using the Python framework Django ~ 4 ~
[CRUD] [Django] Create a CRUD site using the Python framework Django ~ 5 ~
Create a Python module
Create a Python environment
Implementation of VGG16 using Keras created without using a trained model
Let's create a function for parametrized test using frame object
Create a local scope in Python without polluting the namespace
Titanic survival prediction using machine learning workflow management tool Kedro
Create a child account for connect with Stripe in Python
[Python] Create a date and time list for a specified period
Try a similar search for Image Search using the Python SDK [Search]
[Django] Create a model suitable for phone numbers / zip codes
Create a Twitter BOT with the GoogleAppEngine SDK for Python
Try building a neural network in Python without using a library
Create a Python script for Wake on LAN (NAT traversal Wake on LAN [5])
Create a python machine learning model relearning mechanism with mlflow
Create a virtual environment for python on mac [Very easy]
[Python] Create a screen for HTTP status code 403/404/500 with Django
Create a Wox plugin (Python)
Create a function in Python
Create a dictionary in Python
Python: Ship Survival Prediction Part 2
Python: Ship Survival Prediction Part 1
Create JIRA tickets using Python
Create a python numpy array
Python: Ship Survival Prediction Part 3
[For Kaggle beginners] Titanic (LightGBM)
Create a directory with python
Create a Python environment for professionals in VS Code on Windows
Create an API that returns data from a model using turicreate
Build a local development environment for Lambda + Python using Serverless Framework
Create a striped illusion with gamma correction for Python3 and openCV3
Create a color picker for the color wheel with Python + Qt (PySide)
Try using virtualenv, which can build a virtual environment for Python
Create a simple scheduled batch using Docker's Python Image and parse-crontab
[Ev3dev] Create a program that captures the LCD (screen) using python
Python vba to create a date string for creating a file name