Build a basic Data Science environment (Jupyter, Python, R, Julia, standard library) with Docker.

table of contents

motivation

――I want to create a data analysis environment with ** Docker **. -I want to use ** Python **, ** R ** or ** Julia ** with ** Jupyter **. ――I want you to include some standard libraries and packages for each language from the beginning.

Assumed reader

--People who want to build a basic Data Science environment with Docker --People who are not very familiar with Docker and Jupyter --People who want to easily create a data analysis environment

If you are a person like the above and enter the command according to this article, we aim to build a data analysis environment.

Currently, we have confirmed the startup on macOS High Sierra and Ubuntu 18.04.

Actual work

Docker installation

If Docker itself is not installed, you cannot perform the following operations. First of all, please install Docker itself according to the OS from Docker official website.

Select and get Docker images

Docker has a Docker image that is packed with applications. You need to choose the Docker image you need according to what kind of environment you want to create.

This time, I want to use Jupyter Notebook, so I will use the Docker image published on Jupyter official Github.

The Jupyter official publishes various types of Docker images. This time,

I want to create an environment with all of them, so I use something called `` `datascience-notebook```. For details on this Docker image, see Jupyter official website.

Other Docker images published by Jupyter official and their features are easy to understand by looking at the chart below.

projects

Now that you have decided which image to use, first download the docker image to Local with `` `docker pull```.

$ docker pull jupyter/datascience-notebook

docker pullrear,docker run jupyter/datascience-notebookThe container still starts, but there are the following problems.

  1. Data cannot be persisted.
  2. You have not set a password.

If you don't persist the data, it will be difficult later. However, there are many cases where password is not necessary if you work only with Local, so you can read it through.

Get the hash string of the password to set

For password settings,

  1. Generate a ** hash string ** of the password used in the docker container environment that will be erased once.
  2. Set the password when starting the production environment.

I will write it first because I will take the procedure. If you do not need a password, please read this chapter.

First, enter the `` `bash``` environment of the docker container with the following command.

$ docker run -it --rm jupyter/datascience-notebook /bin/bash

When I put it in the `` `bash``` environment of the container safely, it switches to the following output.

jovyan@Alphanumeric:~$ 

If the output is switched, use python inside the docker container to get the hash string of the password you want to use. The following command will launch a prompt to hash the password into a string.

(Inside docker container) $ python -c 'from notebook.auth import passwd;print(passwd())'

When you enter the command, you will be prompted for the password you want to use, so enter it twice.

(Inside docker container) $ python -c 'from notebook.auth import passwd;print(passwd())'
Enter password:
Verify password:
sha1:YOUR_PASSWORD_HASH_VALUE

Keep in mind that the `` `sha1: YOUR_PASSWORD_HASH_VALUE``` (YOUR_PASSWORD_HASH_VALUE depends on the environment) that is output after entering the password twice will be used later.

Once you get the hash string, you're back in the Local environment because you don't need to work in this container. Also, since I set the `` `--rm``` option when starting the docker container, this container is automatically deleted when the docker container is stopped.

(Inside docker container) $ exit

Set the password, make the file persistent, and start the Docker container of the data analysis environment.

Once you have the hash string of the password you want to use, it's time to launch the docker container for analysis. You can set passwords and persist files by passing additional options when starting the docker container.

As mentioned earlier, there is no problem with passwords even if they do not depend on the environment. However, be aware that if you don't persist the files, everything you've done in the container will disappear.

Start the Docker container of the data analysis environment with the following command.

$ docker run  \
    --user root \
    -e GRANT_SUDO=yes \
    -e NB_UID=$UID \
    -e NB_GID=$GID \
    -e TZ=Asia/Tokyo \
    -p 8888:8888 \
    --name notebook \
    -v ~/path/to/directory/:/home/jovyan/work \
    jupyter/datascience-notebook \
    start-notebook.sh \
    --NotebookApp.password='sha1:YOUR_PASSWORD_HASH_VALUE'

To explain the options

  1. File persistence is set in the following part.
-v ~/path/to/directory/:/home/jovyan/work/

By doing this, the work directory and below that can be seen from jupyter notebook will be synchronized with the local directory.

~/path/to/directory/Is the part that sets the directory where files can be exchanged with the docker container, so please use your favorite directory such as each person's working directory.

  1. The password for Jupyter Notebook is set in the following part.
--NotebookApp.password='sha1:YOUR_PASSWORD_HASH_VALUE'

YOUR_PASSWORD_HASH_For VALUE, enter the hash string you generated earlier.




 If you enter the above command and no error is thrown, you should have successfully started Jupyter.
 Let's access it with a browser and check it.
 In most cases, you can find it at http: // localhost: 8888.

 When building in a server environment, replace localhost with the IP address of the target server.
 In the server environment, there are cases where access is not possible because the port of the server itself is closed.
 In that case, open the port used by Jupyter Notebook.


 The image below is the one when it was successfully started.
 If you see a page like this, you can log in using the password you set earlier.

 ![ss_2017-06-30_17.11.51.png](https://qiita-image-store.s3.amazonaws.com/0/43351/c99a80ea-9f4e-5c71-5cc8-3dafa9ae187c.png)

 After that, enjoy the Jupyter environment in your favorite language!

Let's enjoy data science!


Recommended Posts

Build a basic Data Science environment (Jupyter, Python, R, Julia, standard library) with Docker.
Build Jupyter Lab (Python) environment with Docker
[Python] Build a Django development environment with Docker
Build Mysql + Python environment with docker
Data science environment construction with Docker
Build a Python environment with WSL + Pyenv + Jupyter + VS Code
[Linux] Build a jenkins environment with Docker
Build a python virtual environment with pyenv
Build a modern Python environment with Neovim
[Linux] Build a Docker environment with Amazon Linux 2
Build a python environment with ansible on centos6
Create a python3 build environment with Sublime Text3
Build a Python environment with OSX El capitan
Quickly build a Python Django environment with IntelliJ
Build PyPy and Python execution environment with Docker
Build a Python machine learning environment with a container
Build a python execution environment with VS Code
Build a Python + bottle + MySQL environment with Docker on RaspberryPi3! [Trial and error]
Build a local development environment with WSL + Docker Desktop for Windows + docker-lambda + Python
Virtual environment construction with Docker + Flask (Python) + Jupyter notebook
Building a Docker working environment for R and Python
Build a python virtual environment with virtualenv and virtualenvwrapper
Build a python environment for each directory with pyenv-virtualenv
How to build a Django (python) environment on docker
Build a machine learning application development environment with Python
Build a python virtual environment with virtualenv and virtualenvwrapper
Build a development environment with Poetry Django Docker Pycharm
Build python3 environment with ubuntu 16.04
Prepare python3 environment with Docker
Build python environment with direnv
Build a Python environment offline
Build a data analysis environment with Kedro + MLflow + Github Actions
Build a Django development environment with Docker! (Docker-compose / Django / postgreSQL / nginx)
[Django] Build a Django container (Docker) development environment quickly with PyCharm
Build a GVim-based Python development environment on Windows 10 (2) Basic settings
Build a Python environment and transfer data to the server
How to build a python2.7 series development environment with Vagrant
Create a simple Python development environment with VSCode & Docker Desktop
Build a python environment with pyenv (OS X El Capitan 10.11.3)
Create a Todo app with Django ① Build an environment with Docker
Building a Docker working environment for R and Python 2: Japanese support
Build python virtual environment with virtualenv
Build a 64-bit Python 2.7 environment with TDM-GCC and MinGW-w64 on Windows 7
Build a go environment using Docker
Build a deb file with Docker
Build a Python environment on your Mac with Anaconda and PyCharm
Building a virtual environment with Python 3
Build PyPy execution environment with Docker
Build a comfortable psychological experiment / analysis environment with PsychoPy + Jupyter Notebook
Build a Python execution environment using GPU with GCP Compute engine
Build a python3 environment on CentOS7
Create a C ++ and Python execution environment with WSL2 + Docker + VSCode
Create a simple Python development environment with VS Code and Docker
How to build Python and Jupyter execution environment with VS Code
Create a USB boot Ubuntu with a Python environment for data analysis
[DynamoDB] [Docker] Build a development environment for DynamoDB and Django with docker-compose
LaTeX and R (a little Python) environment construction with SublimeText3 (Windows)
Comfortable Jupyter Lab (Python) analysis environment created with Docker + VSCode + Remote Container
Build a CentOS Linux 8 environment with Docker and start Apache HTTP Server
[Mac] Build a Python 3.x environment at the fastest speed using Docker
I tried to build a Mac Python development environment with pythonz + direnv