[PYTHON] Git management of Jupyter notebook (ipynb) differences in easy-to-read with JupyterLab

TL;DR

When proceeding with analysis with Jupyter Notebook, I wanted to manage the version and managed it with Git, but when I proceeded normally, the difference was very difficult to see due to the metadata of Notebook, so JupyterLab's jupyterlab-git and I used the nbdime extension to make it easier to see the differences.

Build a JupyterLab environment

In this article, we will use the following environment.

--Use docker-compose --For installing docker-compose, refer to here. --Use kaggle-images as the base for container images.

The JupyterLab extensions required for version control of Jupyter notebook are as follows.

Environment

Create the following 2 files.

Dockerfile


FROM gcr.io/kaggle-images/python:v74

RUN apt-get update && \
    apt-get install -y git \
    curl

RUN curl -sL https://deb.nodesource.com/setup_12.x | bash - &&\
    apt-get install -y nodejs

RUN pip install -U pip \
    jupyterlab && \
    pip install jupyterlab-git

RUN jupyter lab build

docker-compose.yml


version: "3"
services:
  jupyter:
    build: .
    volumes:
      - $PWD:/tmp/work
    working_dir: /tmp/work
    ports:
      - 8888:8888
    command: jupyter lab --ip=0.0.0.0 --allow-root --no-browser

Build Docker image

After creating the above two files, build in the same directory.

$ docker-compose build

Start container

Start the container after building.

$ docker-compose up

After booting, you can access http: // localhost: 8888 / and enter token to access JupyterLab. The token is output after startup, for example: http://acb729d0c5ce:8888/?token=45d10c660d2e85f0c8d59995a04667c154542ae79f27f65d at 45d10c660d2e85f0c8d59995a04667c154542ae79f27f65d.

Enable Extension Manager

After startup, enable Exxtension Manager.

image.png Two extensions are installed. image.png

Version control your Notebook with Git

Clone Git repository

Clone the required repository. If you already have Notebook etc., do git init etc.

image.png

Enter the repository URL image.png

Create a Notebook (test.ipynb) and do the first commit.

$ git config --global user.email "[email protected]"
$ git config --global user.name "Your Name"
$ git add test.ipynb
$ git commit -m "first commit"

Suppose you proceed with the analysis in Notebook after the first commit. For example, suppose you add the code df.head ().

Difference display with git diff

First of all, if you check with the git diff command, the difference such as the metadata of Notebook is displayed as shown below, which is very difficult to understand.

# git diff
diff --git a/test.ipynb b/test.ipynb
index f6c1f17..5af6074 100644
--- a/test.ipynb
+++ b/test.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -21,7 +21,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -30,12 +30,164 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
     "df = pd.read_csv(data_dir + \"train.csv\")"
    ]
+  },
+  {
+   "cell_type": "code",
:

Difference display in JupyterLab nbdime

If you check the diff using nbdime on JupyterLab, it will be as follows. The left side of pink is before the change, and the right side of green is after the change. image.png

I think that the difference is displayed very easily.

reference

Recommended Posts

Git management of Jupyter notebook (ipynb) differences in easy-to-read with JupyterLab
Generate Jupyter notebook ".ipynb" in Python
Browser specification of Jupyter Notebook in Windows environment
Fill the browser with the width of Jupyter Notebook
To output a value even in the middle of a cell with Jupyter Notebook
Drawing a tree structure with D3.js in Jupyter Notebook
Machine learning with Jupyter Notebook in OCI Always Free environment (2019/12/17)
About the garbled Japanese part of pandas-profiling in Jupyter notebook
Reflect the virtual environment created with Miniconda in Jupyter notebook
How to see the contents of the Jupyter notebook ipynb file
Write charts in real time with Matplotlib on Jupyter notebook
View dynamic graphs in Jupyter notebook. (Inline display of D3.js)
Display HTML in Jupyter notebook
Use pip with Jupyter Notebook
Multiprocessing error in Jupyter Notebook
Use Cython with Jupyter Notebook
Play with Jupyter Notebook (IPython Notebook)
Server management with Jupyter (1) import
Resolve garbled Japanese characters in matplotlib of Jupyter Notebook on Docker
Allow external connections with jupyter notebook
Formatting with autopep8 on Jupyter notebook
Visualize decision trees with jupyter notebook
Make a sound with Jupyter notebook
Use markdown with jupyter notebook (with shortcut)
Add more kernels with Jupyter Notebook
View graphs inline in Jupyter Notebook
Convenient analysis with Pandas + Jupyter notebook
I wanted to use jupyter notebook with docker in pip environment (opticspy)
Initial setting of Jupyter Notebook for Vim lovers ・ Exit with jj (jupyter-vim-binding)
With the advent of systemd-homed in 2020, Linux user management will change dramatically.