[PYTHON] How to implement 100 data science knocks for data science beginners (for windows10 Home)


Recently, I am using various python external libraries such as pandas and numpy because I am creating a private data analysis tool. However, when I use it for data analysis, "I don't know!", My head punctures and I could dunk my PC into a basketball goal. </ b> I felt that this was bad, and when I scrutinized for some good teaching materials, I found a learning environment "Data Science 100 Knock" for data scientists. At that time, this article posted the process until I implemented "100 knocks of data science" on Windows.

What is Data Science 100 Knock in the first place?

Data Science 100 Knock is a free learning environment provided on GitHub by the Data Scientist Association (hereinafter referred to as the DS Association) where you can practically learn about the processing of structured data. It is a very effective learning tool for data scientist beginners. It is possible to learn the basic knowledge of a wide range of data scientists, including R, which is a programming language specialized in statistical analysis, and SQL, which is a database language, as well as the programming language python, which is famous for AI and machine learning.

About advance preparation

Before building the environment, the equipment is prepared in the following conditions.

  • windows 10 home : OS used (described because it is important in Docker Toolbox settings)
  • Docker Toolbox : Tools required to build a virtual environment on Windows
  • git : Required to clone configuration files (containers) that provide a virtual environment for learning "100 knocks of data science" from GitHub

Data science 100 knock environment construction method

There are two main things to do.

  1. Docker Toolbox Setup
  2. Building a learning environment for "100 knocks of data science"
I will explain each of them. ### Docker Toolbox setup In order to use "Data Science 100 Knock", it is necessary to prepare a dedicated environment-a virtual environment-and install a tool called Docker that builds a virtual environment on windows. The procedure is described here.
  1. Access the official Docker website https://docs.docker.com/toolbox/overview/ ![資料1.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/7c589bc6-9a1f-7a33-3f69-e2d55bf2023f.png)
  2. Download Docker Toolbox [* 1] from the official website (Note: Do not mistakenly download Docker for windows [* 2] here !!) * 1: In the case of windows10 home, since we do not have a tool called "Hyper-V" that builds a virtual environment unique to windows, download a tool called Docker Toolbox that brings a "tool for building a virtual environment" from the outside. There is a need. * 2: For windows10 pro, download Docker for windows ![資料2.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/837bdcee-1631-4944-0a28-f00166c2695d.png)
  3. Install the downloaded Docker Toolbox. Unless you need special settings, just press the basic "Next>" to set up. ![資料3.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/8a089030-4dda-b1ea-0488-eacf0e9ee838.png)
  4. Confirm that the following icon for operating Docker appears. ![資料4.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/76951122-b6a9-70de-91c8-3e947cb91c21.png)
  5. Double-click the
  6. Kitematic icon to create a Docker virtual machine named "default" in Oracle VM VirtualBox. Wait a moment (100%) until it is created. ![資料45png.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/4590b9cc-11e6-0270-1d8d-0c1dad8e1c4e.png)
  7. When the Docker Hub login screen appears, press "SKIP FOR NOW" to skip it. (You can create a Docker Hub account at any time, so I will omit it here.) ![資料6png.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/cb492fe1-667f-8e1c-b6ac-b85660307a00.png)
  8. Click "DOCKER CLI" at the bottom left of the screen to open the Docker command line interface ![資料7png.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/c48fae7d-a4d0-d569-9394-18c5c86ca3b6.png)
  9. Execute the ". \ Docker ps" command in "DOCKER CLI". (If an error occurs, execute the "docker ps" command.) If no error occurs and a list of empty containers is displayed, the setup of "Docker Toolbox" is complete. ![資料8png.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/976321aa-9879-d730-aafb-4af570fe7351.png)
### Building a learning environment for "100 knocks of data science" Since the setting file of the learning environment of "Data Science 100 Knock" provided by the DS Association is published on GitHub, it is necessary to drop the setting file from Github to the local PC using git [*]. There is. ![ogp_thum800.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/681742/044582ea-0f11-9650-9537-6dbae773e0e6.jpeg)
  • Refer to the following article for the operation method related to git https://qiita.com/manabu-watanabe/items/ecf1b434baf305adaa00
  1. Open Docker Quickstart Terminal, type the following command, and bring the setting file of the learning environment of "100 knocks-preprocess" from Git Hub to your local PC. 資料12png.png


$ git clone https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess
  1. Open Oracle VM VirtualBox, right-click on the "default" VM and select "Settings". 資料9png.png

  2. [Important! !! ] </ B> Select "Shared Folder", click the icon in the upper right, specify the path of the folder you brought (cloned) from Github earlier, and share the folder. (If you do not do this, you will not be able to share files that are not reflected on Docker, and you will not be able to build a sufficient environment !!) 資料10png.png

  3. After the settings are complete, you will need to reboot for the settings to take effect. Right-click on the "default" VM and select "Reset" to reboot. 資料11png.png

  4. Start Docker Quickstart Terminal again and execute the following command. (It takes a little 10 minutes to complete the setting)


$ cd 100knocks-preprocess
$ docker-compose up -d --build
  1. After completing the settings, access the following address with a browser. It is OK when the following screen is displayed. 資料13png.png


I introduced how to move 100 data science knocks on windows. Currently, I am doing 100 knocks of data science. I will post an article about this problem collection soon, so please take a look if you like. Building a virtual environment with Docker may be difficult at first, but once you get used to it, you can build an infrastructure using GitHub like this time, so it is a very convenient tool.

Recommended Posts