[PYTHON] Recommendation of Jupyter Notebook, a coding environment for data scientists

Data Scientist Article 2nd ww

Data analysis environment for data scientists

I think that data scientists have many opportunities to make presentations and discuss while looking at the data. <-Appropriate speculation Therefore, I feel that a coding environment different from vim / emacs and the old IDE is required.

I think the following four points are required specifications.

Required specifications vim/emacs Rstudio/Spyder Spotfire/Tableu Jupyter Notebook
Can code ×
Data can be visualized interactively ×
Ease of ensuring reproducibility ×
sexy Seen from the general public ×

It's an arbitrary table far from data science, but it's true that you can use Jupyter Notebook without any loss. RStudio (python is Spyder) is good, but ~~ sexy ~~ From the viewpoint of ensuring reproducibility, Jupyter Notebook, which allows you to leave comments with markdown, is better. It is recommended because you can save the coding process even if you are not a data scientist.

What is Jupyter Notebook?

――Python comes with an interactive shell from the beginning, but people who weren't satisfied with it created an interactive shell called IPython (Interactive Python). -Excerpt from How to use IPython

--Cell-oriented coding: Can be executed collectively in units called cells --Tab completion of reserved words, variables, module names, etc. --Investigate Objects: Add? To the object name for more information --Various magic commands: Check execution speed with %% timeit, etc. --Shell commands: Lines starting with! Such as ! Ls can be executed as shell commands --Reuse of inputs and outputs: Cell inputs and outputs are stored in variables called In and Out

--In the IPython project, an IPython Notebook that can input and output IPython from the web has appeared. --Pandas tables, matplotlib graphs, mathjax formulas, etc. can be displayed using web functions. --Display of comments in markdown: Descriptive power has improved at once --You can save and share the analysis process in .ipynb format. --If you put it on github, you can view it in the form of ipython notebook from nbviewer. --Other language users who were looking at the IPython Notebook started hooking the IPython Notebook to work in other languages as well. --The base is a zeromq-based communication called kernel. --This kernel has begun to be made in each language --Julia, Ruby, R, etc. --In a situation where the name IPython Notebook is strange? --Spinned out of the IPython project and became an independent project called Jupyter. - IPython Notebook 4.0 => Jupyter Notebook (JuPyteR: Julia + Python + R) --So Jupyter Notebook and IPython Notebook are the same --The console version called qtconsole, which was developed in IPython, has also moved to Jupyter.

as a result,

--In various languages --Easy to code --The powerful comment expressiveness of Markdown and --Has an interactive data visualization environment via the web --Embedded code makes it easy to retest,

** It is an application that has reproducibility, storage and sharing of the analysis process **. This is a kind of electronic lab notebook. (There is no witness)

Environment

If you add anaconda, it's all included. The environment construction of anaconda can be found at here.

Start the terminal in an appropriate folder and hit the following command. jupyter notebook It is OK if the browser starts up and the Jupyter page is displayed at http: // localhost: 8888.

Jupyter initial settings

Old article Integration


It works for the time being even if you do not set it. If you don't mind, please skip it.


alias jupyter notebook is quite long, so you can make note or alias.

About config

It seems that the config area has changed considerably in jupyter 4.0, so those who have been using it for a long time should check it. https://jupyter.readthedocs.org/en/latest/migrating.html

Creating a config file and setting a password

Reference URL

jupyter notebook --generate-config
#>>> Writing default config to: ~/.jupyter/jupyter_notebook_config.py
python -c "from notebook.auth import passwd;print(passwd())" 
#>>> Enter password:
#>>> Verify password:
#>>> 'sha1:........'

Make a copy of the hash password that starts with sha1: ....

vi ~/.jupyter/jupyter_notebook_config.py

Main changes

Find the parameters below, uncomment them if necessary, and enter the values.

Parameters initial value comment
c.NotebookApp.ip 'localhost' Change if you want to access from other client machines.'*'Fully open.
c.NotebookApp.notebook_dir null Specify the current directory of Jupyter. It is good to specify it somewhere.
c.NotebookApp.open_browser True Do you want to open the browser at startup? Set Servers that do not contain X to False.
c.NotebookApp.port 8888 If you are using 8888 elsewhere, specify a different port.
c.NotebookApp.password null If you enter the hash string you copied earlier, password authentication will be applied.

There are other ssl settings as well, so check the documentation if you need to publish to the web.

Library to load when IPython launches

If you describe the library to be loaded first in ~ / .ipython / profile_default / startup, it will be read together when the kernel is started. Since cell magic can also be described in ipy format, it is good to describe % matplotlin inline as well. If you write seaborn's favorite way of writing, it will be easier after that. If you write too much, the kernel will start up slowly, which is frustrating. (Pandas is relatively heavy)

Example:

00_init.ipy


%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

extension Jupyter Notebookn also has an extension. I will omit it because it is summarized in the following article. Add an extension to build a more comfortable Jupyter environment [jupyter notebook extensions python-markdown(markdown + jinja2)] (http://qiita.com/ksomemo/items/ba0f24daae2276ffd9b2)

RISE

There is a cool extension that allows you to make presentations on your Jupyter Notebook.

git clone https://github.com/damianavila/RISE
cd RISE
python setup.py install

A slideshow button will be added at the top right of the notebook page If you select Slideshow with the Cell Toolbar button on the notebook page, you can specify how far you want to make one slide.

Jupyter Content Management Extensions (3/21 postscript) I forgot that IBM created a super useful extension. If you put this in, you can do a full-text search in the notebook file from jupyter. Introduction to IBM blog It is published on pip so it is easy to install.

pip install jupyter_cms
jupyter cms install --user -s 
jupyter cms activate
jupyter notebook

If you launch the notebook after jupyter cms activate, a search button will be added to the tree screen. You can also search for subordinate codes and comments, which enhances reusability. The search is also quite flexible (http://whoosh.readthedocs.org/en/latest/querylang.html). (I would be more happy if there was a preview ...)

In addition, there are also extensions that allow you to create dashboards with jupyter notebook. JupyterDay NYC slides github

(3/21 postscript up to here)

How to use Jupyter Notebook

There is a good article. Beginning of Jupyter

(Added on April 13, 2016) To switch the environment, enter jupyter_environment_kernels. I found out in a wonderful article here.

Recommended Posts

Recommendation of Jupyter Notebook, a coding environment for data scientists
<Python> Build a dedicated server for Jupyter Notebook data analysis
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Prepare a programming language environment for data analysis
Browser specification of Jupyter Notebook in Windows environment
Library for "I want to do that" of data science on Jupyter Notebook
Recommendation of building a portable Python environment with conda
List of Python libraries for data scientists and data engineers
Shortcut key for Jupyter notebook
Dockerfile for creating a data science environment based on pip3
I want to use a virtual environment with jupyter notebook!
python3.8 venv environment jupyter notebook
Build and test a CI environment for multiple versions of Python
Python environment construction 2016 for those who aim to be data scientists
Output log file with Job (Notebook) of Cloud Pak for Data
Build a comfortable psychological experiment / analysis environment with PsychoPy + Jupyter Notebook
Create a USB boot Ubuntu with a Python environment for data analysis
Snippet settings for python jupyter notebook
Jupyter Notebook essential for software development
Post a Jupyter Notebook as a blog post
Make a sound with Jupyter notebook
[MEMO] [Development environment construction] Jupyter Notebook
Recommended competition site for data scientists
Created an environment for Anaconda & Jupyter
Recommendation of data analysis using MessagePack
Allow Jupyter Notebook to embed audio data in HTML tables for playback
Make your COBOL development environment comfortable 3 --Recommendation for automation of auxiliary work-
A summary of Python e-books that are useful for free-to-read data analysis
A story of a person who started aiming for data scientist from a beginner
Initial setting of Jupyter Notebook for Vim lovers ・ Exit with jj (jupyter-vim-binding)
Quickly build a python environment for deep learning and data science (Windows)
Jupyter Notebook extension, nbextensions settings for myself
Try to make a kernel of Jupyter
Let's create a virtual environment for Python
Make Jupyter Notebook a service on CentOS
Impressions of using Flask for a month
Build the execution environment of Jupyter Lab
[Mac] Building a virtual environment for Python
GPU check of PC on jupyter notebook
Jupyter Notebook Basics of how to use
Building a conda environment for ROS users
Recommendation of Altair! Data visualization with Python
Construction of development environment for Choreonoid class
Run Jupyter notebook on a remote server
Building a Python development environment for AI development
Creating a development environment for machine learning
[Introduction to Data Scientists] Basics of Python ♬
Analysis for Data Scientists: Qiita Self-Article Summary 2020
A memorandum of trouble when formatting data
Introduction of drawing code for figures with a certain degree of perfection of meteorological data
Execute API of Cloud Pak for Data analysis project Job with environment variables
To output a value even in the middle of a cell with Jupyter Notebook
A memorandum of how to execute the! Sudo magic command in Jupyter Notebook
Install and set Jupyter Notebook to create a study note creation environment [Mac]
Build a PYNQ environment on Ultra96 V2 and log in to Jupyter Notebook
A memorandum of method often used when analyzing data with pandas (for beginners)
[Introduction to Python] How to get the index of data with a for statement