[PYTHON] How to implement 100 data science knocks for data science beginners (for windows10 Home)
Recently, I am using various python external libraries such as pandas and numpy because I am creating a private data analysis tool. However, when I use it for data analysis, "I don't know!", My head punctures and I could dunk my PC into a basketball goal. </ b>
I felt that this was bad, and when I scrutinized for some good teaching materials, I found a learning environment "Data Science 100 Knock" for data scientists.
At that time, this article posted the process until I implemented "100 knocks of data science" on Windows.
What is Data Science 100 Knock in the first place?
Data Science 100 Knock is a free learning environment provided on GitHub by the Data Scientist Association (hereinafter referred to as the DS Association) where you can practically learn about the processing of structured data. It is a very effective learning tool for data scientist beginners.
It is possible to learn the basic knowledge of a wide range of data scientists, including R, which is a programming language specialized in statistical analysis, and SQL, which is a database language, as well as the programming language python, which is famous for AI and machine learning.
About advance preparation
Before building the environment, the equipment is prepared in the following conditions.
- windows 10 home
: OS used (described because it is important in Docker Toolbox settings) li>
- Docker Toolbox
: Tools required to build a virtual environment on Windows li>
: Required to clone configuration files (containers) that provide a virtual environment for learning "100 knocks of data science" from GitHub li>
Data science 100 knock environment construction method
There are two main things to do.
- Docker Toolbox Setup li>
- Building a learning environment for "100 knocks of data science" li>
I will explain each of them.
### Docker Toolbox setup
In order to use "Data Science 100 Knock", it is necessary to prepare a dedicated environment-a virtual environment-and install a tool called Docker that builds a virtual environment on windows. The procedure is described here.
- Access the official Docker website
- Download Docker Toolbox [* 1] b> from the official website (Note: Do not mistakenly download Docker for windows [* 2] here !!)
* 1: In the case of windows10 home, since we do not have a tool called "Hyper-V" that builds a virtual environment unique to windows, download a tool called Docker Toolbox that brings a "tool for building a virtual environment" from the outside. There is a need.
* 2: For windows10 pro, download Docker for windows
- Install the downloaded Docker Toolbox. Unless you need special settings, just press the basic "Next>" to set up.
- Confirm that the following icon for operating Docker appears.
- Kitematic icon to create a Docker virtual machine named "default" in Oracle VM VirtualBox. Wait a moment (100%) until it is created.
- When the Docker Hub login screen appears, press "SKIP FOR NOW" to skip it. (You can create a Docker Hub account at any time, so I will omit it here.)
- Click "DOCKER CLI" at the bottom left of the screen to open the Docker command line interface
- Execute the ". \ Docker ps" command in "DOCKER CLI". (If an error occurs, execute the "docker ps" command.) If no error occurs and a list of empty containers is displayed, the setup of "Docker Toolbox" is complete.
### Building a learning environment for "100 knocks of data science"
Since the setting file of the learning environment of "Data Science 100 Knock" provided by the DS Association is published on GitHub, it is necessary to drop the setting file from Github to the local PC using git [*]. There is.
- Refer to the following article for the operation method related to git
- Open Docker Quickstart Terminal, type the following command, and bring the setting file of the learning environment of "100 knocks-preprocess" from Git Hub to your local PC.
$ git clone https://github.com/The-Japan-DataScientist-Society/100knocks-preprocess
Open Oracle VM VirtualBox, right-click on the "default" VM and select "Settings".
[Important! !! ] </ B> Select "Shared Folder", click the icon in the upper right, specify the path of the folder you brought (cloned) from Github earlier, and share the folder. (If you do not do this, you will not be able to share files that are not reflected on Docker, and you will not be able to build a sufficient environment !!)
After the settings are complete, you will need to reboot for the settings to take effect. Right-click on the "default" VM and select "Reset" to reboot.
Start Docker Quickstart Terminal again and execute the following command. (It takes a little 10 minutes to complete the setting)
$ cd 100knocks-preprocess
$ docker-compose up -d --build
- After completing the settings, access the following address with a browser.
It is OK when the following screen is displayed.
I introduced how to move 100 data science knocks on windows.
Currently, I am doing 100 knocks of data science. I will post an article about this problem collection soon, so please take a look if you like.
Building a virtual environment with Docker may be difficult at first, but once you get used to it, you can build an infrastructure using GitHub like this time, so it is a very convenient tool.