[PYTHON] Dockerfile for creating a data science environment based on pip3

things to do

Speaking of Docker container for data science, there is scipy-notebook distributed by jupyter official, but if you look at the Dockerfile, it is based on conda. It is written. But I don't want to use conda for religious reasons. So, this time, I will write a Dockerfile to create an environment for data science based on pip3.

Referenced articles A story about trying to create a machine learning environment using Docker


・ Based on python official Docker image ・ Use pip3 -Load only the required modules from requirements.txt ・ I want to connect with BigQuery with google-cloud-bigquery, so insert the Cloud SDK ・ I want to visualize with jupyterlab + plotly, so insert Node.js

What was made


#Python 3.Based on 8
#reference: https://qiita.com/penpenta/items/3b7a0f1e27bbab56a95f
FROM python:latest

USER root

RUN apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y sudo \
    && apt-get install -y lsb-release \ # google-cloud-Required when installing sdk
    && pip3 install --upgrade pip

#Change working directory,You don't have to
# WORKDIR /home/{Appropriate user name}

#Requirements created in advance and in the same folder as the Dockerfile.Install txt
COPY requirements.txt ${PWD}
RUN pip3 install -r requirements.txt

#Install Cloud SDK
# https://cloud.google.com/sdk/docs/downloads-apt-get
RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
    echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
    apt-get update -y && apt-get install google-cloud-sdk -y

#Node to use plotly.install js
# https://github.com/nodesource/distributions/blob/master/README.md
RUN curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash - \
    && sudo apt-get install -y nodejs

#Support plotly with Jupyter Lab
ENV NODE_OPTIONS=--max-old-space-size=4096
RUN jupyter labextension install @jupyter-widgets/[email protected] --no-build \
    && jupyter labextension install [email protected] --no-buil \
    && jupyter labextension install [email protected] --no-build \
    && jupyter lab build

Write the modules to be installed this time in requirements.txt. Fixed version is your choice



How to use

Put the above two files in an appropriate folder and move to that directory. After that, you can execute the following commands in order.

#Create Docker image
docker build --rm -t {Name of Docker image} .
#Create Docker Container
#Port forwarding to connect inside and outside of Docker(-p)To do
#Mount the folder outside Docker so that the file does not disappear even if you delete the Docker container(-v)To do
docker run -itp {Port outside the container}:{Port in the container} -v {Absolute path of the folder outside the container,At the end"/"Do not attach}:{Absolute path of where you want to mount the folder in the container+The name of the folder to mount in Docker} --name {The name of the container} {The name of the image from which it was created} /bin/bash
#Start jupyter
jupyter lab --ip= --allow-root --port {Port in the container}


#cloud SDK authentication
gcloud init
# google-cloud-bigquery API authentication
export GOOGLE_APPLICATION_CREDENTIALS={Absolute path of authentication file}
export GOOGLE_CLOUD_PROJECT={Project name to connect}

Recommended Posts

Dockerfile for creating a data science environment based on pip3
Creating a python virtual environment on Windows
Creating a development environment for machine learning
Things to watch out for when creating a Python environment on a Mac
Quickly build a python environment for deep learning and data science (Windows)
A tool for creating symbolic links on Windows
I created a Dockerfile for Django's development environment
Commands for creating a python3 environment with virtualenv
Build a Kubernetes environment for development on Ubuntu
Procedure for creating a Python quarantine environment (venv environment)
A memo for creating a python environment by a beginner
[Grasshopper] When creating a data tree on Python script
Build a local development environment for Laravel6.X on Mac
Notes on creating a python development environment on macOS Catalina
Procedure for building a CDK environment on Windows (Python)
Procedure for creating a Line Bot on AWS Lambda
Notes on creating a virtual environment with Anaconda Navigator
Recommendation of Jupyter Notebook, a coding environment for data scientists
Try "100 knocks on data science" ①
Build a python data analysis environment on Mac (El Capitan)
Create a virtual environment for python on mac [Very easy]
Build a python environment on CentOS 7.7 for your home server
Create a Python environment for professionals in VS Code on Windows
Install Networkx in Python 3.7 environment for use in malware data science books
Create a USB boot Ubuntu with a Python environment for data analysis
Building a Python environment on Mac
Start data science on the cloud
Building a Python environment on Ubuntu
Create a Python environment on Mac (2017/4)
Create a Linux environment on Windows 10
Create a python environment on centos
Build a python3 environment on CentOS7
Data science environment construction with Docker
Publish a web application for viewing data created with Streamlit on heroku
Programming environment for beginners made on Windows
Build a python environment on MacOS (Catallina)
Create a python environment on your Mac
Building a LaTeX environment on Chrome OS
Let's create a virtual environment for Python
Commands for creating a new django project
[Mac] Building a virtual environment for Python
I installed Kivy on a Mac environment
Memo for creating a text formatting tool
Creating a data analysis application using Streamlit
Building a conda environment for ROS users
I built a TensorFlow environment on windows10
Build a Python + OpenCV environment on Cloud9
Creating a cholera map for John Snow
Creating a virtual environment in an Anaconda environment
Data analysis environment centered on Datalab (+ GCP)
Building a Python development environment for AI development
JupyterLab Basic Setting 2 (pip) for data analysis
Building an environment for "Tello_Video" on Raspbian
JupyterLab Basic Setup for Data Analysis (pip)
I made a library for actuarial science
Building an environment for "Tello_Video" on Windows
Books on data science to read in 2020
Prepare the development environment for Python on AWS Cloud9 (pip install & time change)
Try to draw a "weather map-like front" by machine learning based on weather data (5)
Try to draw a "weather map-like front" by machine learning based on weather data (3)
Creating an environment for OSS-DB Silver # 1_Create a Linux environment (CentOS7 virtual environment) with VirtualBox/Vagrant