[PYTHON] [For non-programmers] How to walk Kaggle

This article is the third day of the Advent Calendar in Kaggle Advent Calendar 2019.

I hope it will be helpful for those who think "** Kaggle is interested, but it's apt and scary ... **"!

Until I met Kaggle

Below is a brief history of my career before I met Kaggle.

--Marketer ⇒ Consultant --From liberal arts (art history major) --Inexperienced in programming (just touched HTML) --There is only awe on the black screen ...

図13.png

I was in such a state, but in the last year I studied Python & machine learning and managed to understand Kaggle. So, I will write to myself a year ago that it is good to know this.

Target audience

To whom?

--I want to improve my data analysis / machine learning skills --Kaggle Looks fun, but inexperienced in programming --Mainly marketers and planning staff

what?

--Kaggle is worth a try!

What is "Kaggle"?

What is Kaggle

In a word, it is the Tenkaichi Budokai of data analysis using "machine learning". image.png

Data to analyze

--Companies provide data and themes and hold competitions --Themes range from medical care, marketing, finance, etc.

Competition period

――For about 1 to 3 months --Relatively long

Ranking

--Create a machine learning model and rank it according to the prediction result of the model

Prize money

――There is also a competition where prize money will be given to the top players --As of December 2019, the DSB competition currently being held is No. 1 for $ 100,000 (about 10 million yen!)

Entry fee

Registered user

--Approximately 120,000 people (as of December 2019) * Reference source

About rank

Kaggle has five ranks depending on its strength. (Strictly speaking, you can get a rank not only by your strength in the competition, but also by your contribution to the Kaggle community such as Kernels, Discussion, and Datasets.)

--Grandmaster * The strongest ...!

It's no exaggeration to say that the people who work on Kaggle (commonly known as Kaggler) are spending a lot of time every day trying to compete in order to raise this rank (probably).

Number of people by rank

The number of people as of December 2019 is as follows キャプチャ.PNG

Image of each rank

It is an outline and an image diagram of each rank. (It's just subjective) 図1.png

Rank up conditions

Click here for rank-up conditions. (In Competitions) The Expert suddenly asks for a medal. .. .. If you are a beginner, you want to aim for Expert first. 図2.png

Medal acquisition conditions

There are three types of medals: Gold, Silver, and Bronze.

Click here for medal acquisition conditions. (In Competitions) Strict conditions vary depending on the number of participants, but the following is a guide.

図3.png

For example, such a competition

Prediction of survivors of the Titanic sinking accident

Predicting the speed at which dogs and cats are picked up at pet stores

Benefits of doing Kaggle

I've listed three benefits of doing Kaggle.

① Understand your own data science skills (also prove)

Data science skills are difficult to measure without practice. However, earning medals and ranks at Kaggle can prove your approximate data science skills. Recently, the number of companies adopting Kaggle is gradually increasing, which is also evidence when stepping up. 図4.png

② Learn the know-how of data scientists from around the world (free of charge!)

As I learned by actually participating in the competition, Kaggle has a wonderful culture of sharing know-how and information, not just competing with others. In particular, "Notebooks" explains the know-how and trends of data science in an easy-to-understand manner along with the code, and is recommended for beginners. 図5.png

③ You can understand what machine learning is doing at the skin level

Nowadays, machine learning is being incorporated more and more into the apps and marketing tools around us. With Kaggle, you can master the modeling process of machine learning, so you can get a feel for what you can and cannot do with machine learning. 図6.png

Kaggle should start here

There seems to be various disagreements, but there are three personal points below.

① If you have no machine learning / Python knowledge, choose one "Matsuo Lab" textbook!

Matsuo Laboratory of the University of Tokyo, which is famous for machine learning / AI.

You can download the text of "Data Scientist Training Course" conducted by Matsuo Laboratory for free. This is amazing, it's an incident! The text and Python code carefully explain the beginner's graduation level of machine learning, so if you have difficulty understanding the contents of Kaggle, I think you should complete this.

▼ Click here to download the "Matsuo Lab" text https://weblab.t.u-tokyo.ac.jp/gci_contents/

図8.png

② Use "Notebooks" to copy the ones with a high number of votes and get the hang of it!

As I mentioned earlier, the "Notebooks" in each competition are a treasure trove.

If you choose a competition that seems easy to work on, it is recommended to copy the notebook with a high number of votes in "Notebooks". Notebooks with a high number of votes are carefully selected and easy to understand, so you can relive the flow from data acquisition to model result submission by copying sutras, and you can learn the tips of machine learning. 図10.png

③ If you are alone, you will get stuck without information, so it is recommended to connect with someone!

Kaggle is a basic individual battle. However, the competition period is as long as 1 to 3 months, and motivation may drop or it may get bogged down.

At such times, if you have friends who are doing Kaggle, you can share useful information (private sharing is NG) and encourage each other, so you can enjoy Kaggle even more.

Personally, I recommend the following three.

--Go to Kaggle related seminars (connpass, etc.) --Follow Kaggle on Twitter --Let's join the Kaggle community loosely

図11.png

at the end

Kaggle is a great platform for beginners to advanced machine learning / data analysts.

When you're doing Kaggle, you'll be able to compete with TOP data scientists around the world, share information, and get taught. However, when I think about it carefully, I realized that this is amazing, and while writing this article, I realized it again.

図12.png

We will continue to do our best to help Kaggle remain a good platform! (First of all, I want a medal ...)

Recommended Posts

[For non-programmers] How to walk Kaggle
How to check for missing values (Kaggle: House Prices)
How to create * .spec files for pyinstaller.
[Python] Organizing how to use for statements
How to install Windows Subsystem For Linux
How to use Pylint for PyQt5 apps
How to use "deque" for Python data
How to use fingerprint authentication for KDE
How to specify the launch browser for JupyterLab 3.0.0
How to use MkDocs for the first time
How to make Spigot plugin (for Java beginners)
How to use Template Engine for Network Engineer
How to install Python for pharmaceutical company researchers
How to use data analysis tools for beginners
How to write a ShellScript Bash for statement
How to create a shortcut command for LINUX
[ESXi (vCenter)] How to add NIC for CentOS 7.3
How to make Python faster for beginners [numpy]
[For beginners] How to study programming Private memo
How to find the correlation for categorical variables
How to force build TensorFlow 2.3.0 for CUDA11 + cuDNN8
How to set CPU affinity for process threads
How to use xml.etree.ElementTree
How to use Python-shell
[For beginners] How to use say command in python!
How to use tf.data
How to use virtualenv
Scraping 2 How to scrape
How to use Seaboan
How to use image-match
How to use shogun
How to convert Python # type for Python super beginners: str
How to install Python
How to use Pandas 2
[For beginners] How to study Python3 data analysis exam
How to read PyPI
How to install pip
How to use Virtualenv
How to use numpy.vectorize
How to update easy_install
How to install archlinux
How to run python in virtual space (for MacOS)
How to use pytest_report_header
How to restart gunicorn
[Go] How to create a custom error for Sentry
How to install python
How to virtual host
How to debug selenium
How to use partial
How to use Bio.Phylo
[Introduction to Azure for kaggle users] Comparison of how to start and use Azure Notebooks and Azure Notebooks VM
How to read JSON
How to make unit tests Part.2 Class design for tests
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to update Spyder
How to use IPython
How to install BayesOpt
How to use virtualenv
How to use Matplotlib