[Update] There is another article about the updated version of Jupyter 5 series.
-Jupyter Docker image summary --Qiita
IPython has been integrated into Project Jupyter since version 4.0. Jupyter can be used not only from Python but also from R, Julia, Scala, and can be said to be the core tool for data analysis. Not only can it be used in over 40 programming languages, it can also encourage collaboration and integrate with Apache Spark via the Jupyter Notebook Viewer (http://nbviewer.jupyter.org/). I thought I should try SciPy Stack for a moment, and it is different from the generation that uses IPython Notebook.
Until now, NumPy and pandas seemed to be difficult to install, but Jupyter has multiple They have prepared a Docker image so you can try it out relatively easily. If you want to complete it in your browser, you can try it immediately on the Try Jupyter! site.
Start the Notebook server using the official Docker image. Basically, jupyter / datascience-notebook is good, but if you use Spark, [jupyter / all-spark-] notebook](https://hub.docker.com/r/jupyter/all-spark-notebook/) or jupyter/pyspark-notebook Either /) would be better. Many packages are pre-installed on the image, which is about 4.5GB. It's a good idea to take a look at the installed packages while waiting for the download.
$ docker pull jupyter/datascience-notebook
$ docker images jupyter/datascience-notebook
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
jupyter/datascience-notebook latest 8e21bfc3eeba 11 hours ago 4.592 GB
Start the container using port 8888.
$ docker run -d --name notebook -p 8888:8888 jupyter/datascience-notebook
If you access it with a browser, you can see the interface that looks cleaner than IPython Notebook. If you press "New", you may be more excited by the appearance of multiple options.
Although it can be used in many ways, Python will be the mainstream from the perspective of diversion of existing assets. Make sure that various Python 3 modules are available. First, let's draw a sin / cos curve using Bokeh.
Next, let's get the Nikkei 225 from the Yahoo! API using the pandas module. Also make sure that the characters are not garbled even if Japanese is used for the axis of the graph.
I think that RStudio is easier to use if you write R normally, but if you consider the possibility of sharing notebooks with teams and forming clusters on the server side, it is better to be familiar with using Jupyter. It can be said that it is good. It is also useful in absorbing environmental differences such as whether a package is installed or not, or whether it can be installed or not, depending on the environment.
If you switch the kernel, the logo on the upper right will also switch. I think this is a useful function when going back and forth between multiple environments.
I don't know if it was a Jupyter Notebook or an IPython Notebook, but you can also upload data files. When launched via Docker, it may be difficult to link with the data container. However, you can use the "Upload" button to upload the data in your local file system. Of course, it's also useful if the client and server are running on different machines.
Uploaded files can be viewed from notebooks in any language. I will try to check the kernel with Julia. It may not make much difference, but the language display in the upper right is Julia 0.3.2.
You can preview prints and download with Markdown regardless of the language of the kernel. It seems to be useful as a means to record the analysis results as a simple report.
You can also launch a terminal and install the package. For example, try installing * xlsxwriter * using pip
.
I started the Jupyter Notebook server using the official Docker image and confirmed that Python, R and Julia work. It takes time to download the image, but I think it is very easy to install without any trouble caused by version mismatch of multiple software.
Providing different execution environments and data storage depending on the skills of the organization and members, or the analysis method can be a pain, but integration with Jupyter may reduce management costs. Since the output format is also roughly unified, it seems to be useful as a recording means.
Recommended Posts