[PYTHON] Day 70 GCI2019Winter has been successfully completed!

The University of Tokyo's Data Scientist course, which started in December, has been successfully completed! GCI Online Course The University of Tokyo Data Scientist / Future CMO Training Course Online Course It's free of charge in the course of requesting a data scientist in 3 months.

● Course outline
*Freely analyze and analyze a large amount of data to discover hidden relationships. Demand for "data scientists" who have acquired such skills is increasing not only in the engineering field but also in numerous fields such as medical care, economics, management, and life sciences.
*In this course, you will comprehensively acquire the basics of machine learning and big data handling technology, which is the core of data analysis and analysis skills that can be used as weapons in all fields, and technology that effectively visualizes analysis results. We aim to stand at the entrance to play an active role as a data scientist.
*There is no tuition fee (communication fee etc. will be borne by you).

Anyway, there are a lot of tasks. For the past three months, I've been immersed in machine learning every day. The textbook of the previous year was on sale, so I looked through it, but I was about to throw it because it was too difficult.

Data Scientist Training Course at the University of Tokyo Kunitaka Tsukamoto https://www.amazon.co.jp/dp/B07PD237GQ/ref=cm_sw_r_tw_dp_U_x_j1bHEb66RA461

However, the text was well-developed, I could ask questions on Slack at any time, I participated in the team, and the support system helped me a lot, and I managed to finish it.

You will be able to read the text that has become the best. This gives me confidence. We are recruiting students again from April, so I would like to recommend it to anyone who wants to start machine learning from now on.

Attendance test (12 / 7-12)

There was a test to confirm the skill before taking the course. It is a basic level like WhirlwindTourOfPython. The link destination suddenly got confused by the explanation in English. I managed to clear it because I was biting Python. The problem is to solve the matrix problem with Python. Mathematics was frustrated in junior high school, so I couldn't understand the meaning of the problem at all. I managed to clear it by having someone who is good at math teach me.

Part 1 (12/18) What is data science? Python basics, libraries

Although there is a large amount of data due to the introduction of IT, it is hardly analyzed. When I analyzed this, it was a lecture that I could make tremendous use in business. It was interesting to see the video delivered the next day. Then there's the Python foundation.

Part 2 (12/25) Basics of scientific calculation and data processing by Python (Numpy, Pandas)

I learned spreadsheets in Python. It's all like Excel with commands. Aggregation, statistics, etc I used a lot after this.

Competition 1 Life and death prediction of the Titanic.

Same as Kaggle's Titanic: Machine Learning from Disaster. At first I didn't know at all, and even if I saw a movie and came up with a prediction, I was groping in the dark how to implement it. I was persistent while exchanging opinions with Kaggle, the team, and Slack. It gets interesting when the idea leads to a score. I was ranked in the boosting to reuse the result for the next prediction. The high-level code will be released after the competition, but I'm just impressed that there are some really smart people.

Part 3 (1/8) Data visualization in data science (Matplotlib)

I learned how to make graphs, correlation matrices and heatmaps. It's hard to express as you want, so it's a good idea to collect beautiful graphs at first and copy and paste the code to get used to it.

4th (1/15) Basics of Probability Statistics

How to draw statistics such as totals and averages, scatter plots, and histograms. I don't know the probability and statistics at all. I was finally able to solve the beginner's problem.

Competition 2 Wine quality

From the Portuguese wine project, the ingredients of 4898 bottles and the evaluation by the sommelier are read and the taste is predicted from the ingredients. I love wine. It was a fun competition to read the label of the wine in the store and think about it.

5th (1/29) Basics of Machine Learning (Supervised Learning)

About supervised learning, unsupervised learning, and reinforcement learning. The method used in Titanic and wine prediction is explained carefully once again. Multiple regression, logistic regression, lasso regression, ridge regression, K-NN, support vector machine If you follow the tutorial, you can use it somehow.

6th (2/5) Basics of Machine Learning (Unsupervised Learning)

I learned unsupervised learning and building a learning model with no objective variables. Rough classification, clustering, principal component analysis ... To be honest, I still don't understand well. It is convenient to be able to screen data without prior information, so remember it. It needs a review.

Competition # 3 PUBG Winner Prediction

Same as Kaggle's PUBG Finish Placement Prediction (Kernels Only). PUBG is a game in which 100 participants are sent to the island for battle royale. I tried it, but I just killed it. First of all, I asked people who often play to show me where they are playing games. It seems that the deciding factor is how to use items, vehicles, weapons, and how to sneak into the ever-narrowing safety zone. However, I have no idea how to code the strength of the game. Kaggle has a lot of notebooks out there about how they predicted. There were many scale approaches from the eyes. Machine learning develops at a tremendous speed. When I submitted a combination of strong codes, I got a good score.

Unit 7 (2/12) Model verification method and tuning method

You learned how to adjust model parameters, features different models, and forecasts by combining multiple models. If you try to do it properly, you will fall into precedentism or think too much and remove it, so the image is to make some asobi. Then, when I implemented a method called bagging, which divides the data into several parts and trains them, the prediction accuracy became much better. I don't understand it well yet, so I would like to deepen it.

8th (3/4) Road to Intermediate Data Scientist

We invited people who are active on the front lines to talk about the site. I learned how to use data for business issues, such as marketing and what you want people to think about selling things. The story of measuring the effect of commercials was very interesting. Then there was deep learning, algorithms to run Python fast, Pyspark, SparkSQL, and other mathematical methods, and engineering tools.

Final issue New business proposal to Home Credit

Let's analyze the data of Home Credit, a credit company operating in Southeast Asia, and propose a new business. Using data from Kaggle's Home Credit Default Risk last year, another approach is taken by management who is unaware of machine learning. To present. Since I have no idea about this, I managed to submit it, thinking that I would be pleased with such a prediction, referring to the 8th lecture.

Completed

Looking back in this way, I am impressed that I was able to complete it well. I am very grateful to everyone in the course for their generous support.

Various schedules have been canceled in Corona, so I will review them little by little. GCI 2020 Summer will start again in April, so if you are interested in machine learning, please do. it's recommended!

Recommended Posts

Day 70 GCI2019Winter has been successfully completed!
Chainer v1.21 has been released