[PYTHON] Try using Jupyter's Docker image

[Update] There is another article about the updated version of Jupyter 5 series.

-Jupyter Docker image summary --Qiita


IPython has been integrated into Project Jupyter since version 4.0. Jupyter can be used not only from Python but also from R, Julia, Scala, and can be said to be the core tool for data analysis. Not only can it be used in over 40 programming languages, it can also encourage collaboration and integrate with Apache Spark via the Jupyter Notebook Viewer (http://nbviewer.jupyter.org/). I thought I should try SciPy Stack for a moment, and it is different from the generation that uses IPython Notebook.

Until now, NumPy and pandas seemed to be difficult to install, but Jupyter has multiple They have prepared a Docker image so you can try it out relatively easily. If you want to complete it in your browser, you can try it immediately on the Try Jupyter! site.

Start Notebook server

Start the Notebook server using the official Docker image. Basically, jupyter / datascience-notebook is good, but if you use Spark, [jupyter / all-spark-] notebook](https://hub.docker.com/r/jupyter/all-spark-notebook/) or jupyter/pyspark-notebook Either /) would be better. Many packages are pre-installed on the image, which is about 4.5GB. It's a good idea to take a look at the installed packages while waiting for the download.

$ docker pull jupyter/datascience-notebook
$ docker images jupyter/datascience-notebook
REPOSITORY                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
jupyter/datascience-notebook   latest              8e21bfc3eeba        11 hours ago        4.592 GB

Start the container using port 8888.

$ docker run -d --name notebook -p 8888:8888 jupyter/datascience-notebook

If you access it with a browser, you can see the interface that looks cleaner than IPython Notebook. If you press "New", you may be more excited by the appearance of multiple options.

スクリーンショット 2015-09-24 21.35.10.png

Python notebook

Although it can be used in many ways, Python will be the mainstream from the perspective of diversion of existing assets. Make sure that various Python 3 modules are available. First, let's draw a sin / cos curve using Bokeh.

スクリーンショット 2015-09-24 21.48.01.png

Next, let's get the Nikkei 225 from the Yahoo! API using the pandas module. Also make sure that the characters are not garbled even if Japanese is used for the axis of the graph.

スクリーンショット 2015-09-24 21.57.36.png

R notebook

I think that RStudio is easier to use if you write R normally, but if you consider the possibility of sharing notebooks with teams and forming clusters on the server side, it is better to be familiar with using Jupyter. It can be said that it is good. It is also useful in absorbing environmental differences such as whether a package is installed or not, or whether it can be installed or not, depending on the environment.

If you switch the kernel, the logo on the upper right will also switch. I think this is a useful function when going back and forth between multiple environments.

スクリーンショット 2015-09-24 22.23.46.png

Data upload

I don't know if it was a Jupyter Notebook or an IPython Notebook, but you can also upload data files. When launched via Docker, it may be difficult to link with the data container. However, you can use the "Upload" button to upload the data in your local file system. Of course, it's also useful if the client and server are running on different machines.

スクリーンショット 2015-09-24 22.30.14.png

Uploaded files can be viewed from notebooks in any language. I will try to check the kernel with Julia. It may not make much difference, but the language display in the upper right is Julia 0.3.2.

スクリーンショット 2015-09-24 22.48.20.png

You can preview prints and download with Markdown regardless of the language of the kernel. It seems to be useful as a means to record the analysis results as a simple report.

Package installation

You can also launch a terminal and install the package. For example, try installing * xlsxwriter * using pip.

スクリーンショット 2015-09-24 23.11.53.png

Summary

I started the Jupyter Notebook server using the official Docker image and confirmed that Python, R and Julia work. It takes time to download the image, but I think it is very easy to install without any trouble caused by version mismatch of multiple software.

Providing different execution environments and data storage depending on the skills of the organization and members, or the analysis method can be a pain, but integration with Jupyter may reduce management costs. Since the output format is also roughly unified, it seems to be useful as a recording means.

Recommended Posts

Try using Jupyter's Docker image
Generate a Docker image using Fabric
Try using Tkinter
Try using docker-py
Try using cookiecutter
Try using PDFMiner
Try using geopandas
Try using Selenium
Try using scipy
Try using pandas.DataFrame
Try using django-swiftbrowser
Try using matplotlib
Try using tf.metrics
Try using PyODE
Expose your Docker image
Try using virtualenv (virtualenvwrapper)
[Azure] Try using Azure Functions
Try using virtualenv now
Image segmentation using U-net
Try using W & B
Try using Django templates.html
[Kaggle] Try using LGBM
Try using Python's feedparser.
Try using Python's Tkinter
Try using Tweepy [Python2.7]
Jupyter Docker image summary
Try using Pytorch's collate_fn
Try using PythonTex with Texpad.
[Python] Try using Tkinter's canvas
Try using scikit-learn (1) --K-means clustering
Try function optimization using Hyperopt
Try using matplotlib with PyCharm
Try using Azure Logic Apps
Try using Kubernetes Client -Python-
[Kaggle] Try using xg boost
Try using the Twitter API
Try using OpenCV on Windows
Try a similar search for Image Search using the Python SDK [Search]
Try using Jupyter Notebook dynamically
Try using AWS SageMaker Studio
Try tweeting automatically using Selenium.
Cloud image prediction using convLSTM
Try using SQLAlchemy + MySQL (Part 1)
Try using the Twitter API
Try using SQLAlchemy + MySQL (Part 2)
Try using Django's template feature
Try using the PeeringDB 2.0 API
Try using Pelican's draft feature
Create a GCE instance from a GCR Docker image using terraform
Try using pytest-Overview and Samples-
Try Selenium Grid with Docker
Try building JupyterHub with Docker
Try using folium with anaconda
Try to edit a new image using the trained StyleGAN2 model
Try docker: Create your own container image for your Python web app
Build and try an OpenCV & Python environment in minutes using Docker
Try using Janus gateway's Admin API
[Statistics] [R] Try using quantile regression.
Try using Spyder included in Anaconda
Try using design patterns (exporter edition)
Try running tensorflow on Docker + anaconda