[PYTHON] Dockerfile for creating a data science environment based on pip3

things to do

Speaking of Docker container for data science, there is scipy-notebook distributed by jupyter official, but if you look at the Dockerfile, it is based on conda. It is written. But I don't want to use conda for religious reasons. So, this time, I will write a Dockerfile to create an environment for data science based on pip3.

Referenced articles A story about trying to create a machine learning environment using Docker

policy

・ Based on python official Docker image ・ Use pip3 -Load only the required modules from requirements.txt ・ I want to connect with BigQuery with google-cloud-bigquery, so insert the Cloud SDK ・ I want to visualize with jupyterlab + plotly, so insert Node.js

What was made

Dockerfile

#Python 3.Based on 8
#reference: https://qiita.com/penpenta/items/3b7a0f1e27bbab56a95f
FROM python:latest

USER root

RUN apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y sudo \
    && apt-get install -y lsb-release \ # google-cloud-Required when installing sdk
    && pip3 install --upgrade pip

#Change working directory,You don't have to
# WORKDIR /home/{Appropriate user name}

#Requirements created in advance and in the same folder as the Dockerfile.Install txt
COPY requirements.txt ${PWD}
RUN pip3 install -r requirements.txt

#Install Cloud SDK
# https://cloud.google.com/sdk/docs/downloads-apt-get
RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
    echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
    apt-get update -y && apt-get install google-cloud-sdk -y

#Node to use plotly.install js
# https://github.com/nodesource/distributions/blob/master/README.md
RUN curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash - \
    && sudo apt-get install -y nodejs

#Support plotly with Jupyter Lab
ENV NODE_OPTIONS=--max-old-space-size=4096
RUN jupyter labextension install @jupyter-widgets/[email protected] --no-build \
    && jupyter labextension install [email protected] --no-buil \
    && jupyter labextension install [email protected] --no-build \
    && jupyter lab build
ENV NODE_OPTIONS=

Write the modules to be installed this time in requirements.txt. Fixed version is your choice

requirements.txt


numpy
pandas
matplotlib
seaborn
scikit-learn
scrapy
jupyter
plotly
google-cloud-bigquery
jupyterlab

How to use

Put the above two files in an appropriate folder and move to that directory. After that, you can execute the following commands in order.

#Create Docker image
docker build --rm -t {Name of Docker image} .
#Create Docker Container
#Port forwarding to connect inside and outside of Docker(-p)To do
#Mount the folder outside Docker so that the file does not disappear even if you delete the Docker container(-v)To do
docker run -itp {Port outside the container}:{Port in the container} -v {Absolute path of the folder outside the container,At the end"/"Do not attach}:{Absolute path of where you want to mount the folder in the container+The name of the folder to mount in Docker} --name {The name of the container} {The name of the image from which it was created} /bin/bash
#Start jupyter
jupyter lab --ip=0.0.0.0 --allow-root --port {Port in the container}

Others

#cloud SDK authentication
gcloud init
# google-cloud-bigquery API authentication
hogehoge
export GOOGLE_APPLICATION_CREDENTIALS={Absolute path of authentication file}
export GOOGLE_CLOUD_PROJECT={Project name to connect}

Recommended Posts

Dockerfile for creating a data science environment based on pip3
Creating a python virtual environment on Windows
Creating a development environment for machine learning
Things to watch out for when creating a Python environment on a Mac
Quickly build a python environment for deep learning and data science (Windows)
A tool for creating symbolic links on Windows
I created a Dockerfile for Django's development environment
Commands for creating a python3 environment with virtualenv
Build a Kubernetes environment for development on Ubuntu
Procedure for creating a Python quarantine environment (venv environment)
A memo for creating a python environment by a beginner
[Grasshopper] When creating a data tree on Python script
Build a local development environment for Laravel6.X on Mac
Notes on creating a python development environment on macOS Catalina
Procedure for building a CDK environment on Windows (Python)
Procedure for creating a Line Bot on AWS Lambda
Notes on creating a virtual environment with Anaconda Navigator
Recommendation of Jupyter Notebook, a coding environment for data scientists
Try "100 knocks on data science" ①
Build a python data analysis environment on Mac (El Capitan)
Create a virtual environment for python on mac [Very easy]
Build a python environment on CentOS 7.7 for your home server
Create a Python environment for professionals in VS Code on Windows
Install Networkx in Python 3.7 environment for use in malware data science books
Create a USB boot Ubuntu with a Python environment for data analysis
Building a Python environment on Mac
Start data science on the cloud
Building a Python environment on Ubuntu
Create a Python environment on Mac (2017/4)
Create a Linux environment on Windows 10
Create a python environment on centos
Build a python3 environment on CentOS7
Data science environment construction with Docker
Publish a web application for viewing data created with Streamlit on heroku
Programming environment for beginners made on Windows
Build a python environment on MacOS (Catallina)
Create a python environment on your Mac
Building a LaTeX environment on Chrome OS
Let's create a virtual environment for Python
Commands for creating a new django project
[Mac] Building a virtual environment for Python
I installed Kivy on a Mac environment
Memo for creating a text formatting tool
Creating a data analysis application using Streamlit
Building a conda environment for ROS users
I built a TensorFlow environment on windows10
Build a Python + OpenCV environment on Cloud9
Creating a cholera map for John Snow
Creating a virtual environment in an Anaconda environment
Data analysis environment centered on Datalab (+ GCP)
Building a Python development environment for AI development
JupyterLab Basic Setting 2 (pip) for data analysis
Building an environment for "Tello_Video" on Raspbian
JupyterLab Basic Setup for Data Analysis (pip)
I made a library for actuarial science
Building an environment for "Tello_Video" on Windows
Books on data science to read in 2020
Prepare the development environment for Python on AWS Cloud9 (pip install & time change)
Try to draw a "weather map-like front" by machine learning based on weather data (5)
Try to draw a "weather map-like front" by machine learning based on weather data (3)
Creating an environment for OSS-DB Silver # 1_Create a Linux environment (CentOS7 virtual environment) with VirtualBox/Vagrant