TL;DR
When proceeding with analysis with Jupyter Notebook, I wanted to manage the version and managed it with Git, but when I proceeded normally, the difference was very difficult to see due to the metadata of Notebook, so JupyterLab's jupyterlab-git
and I used the nbdime
extension to make it easier to see the differences.
In this article, we will use the following environment.
--Use docker-compose --For installing docker-compose, refer to here. --Use kaggle-images as the base for container images.
The JupyterLab extensions required for version control of Jupyter notebook are as follows.
Create the following 2 files.
Dockerfile
FROM gcr.io/kaggle-images/python:v74
RUN apt-get update && \
apt-get install -y git \
curl
RUN curl -sL https://deb.nodesource.com/setup_12.x | bash - &&\
apt-get install -y nodejs
RUN pip install -U pip \
jupyterlab && \
pip install jupyterlab-git
RUN jupyter lab build
docker-compose.yml
version: "3"
services:
jupyter:
build: .
volumes:
- $PWD:/tmp/work
working_dir: /tmp/work
ports:
- 8888:8888
command: jupyter lab --ip=0.0.0.0 --allow-root --no-browser
After creating the above two files, build in the same directory.
$ docker-compose build
Start the container after building.
$ docker-compose up
After booting, you can access http: // localhost: 8888 / and enter token to access JupyterLab.
The token is output after startup, for example: http://acb729d0c5ce:8888/?token=45d10c660d2e85f0c8d59995a04667c154542ae79f27f65d
at 45d10c660d2e85f0c8d59995a04667c154542ae79f27f65d
.
After startup, enable Exxtension Manager.
Two extensions are installed.
Clone the required repository. If you already have Notebook etc., do git init etc.
Enter the repository URL
Create a Notebook (test.ipynb
) and do the first commit.
$ git config --global user.email "[email protected]"
$ git config --global user.name "Your Name"
$ git add test.ipynb
$ git commit -m "first commit"
Suppose you proceed with the analysis in Notebook after the first commit. For example, suppose you add the code df.head ().
First of all, if you check with the git diff
command, the difference such as the metadata of Notebook is displayed as shown below, which is very difficult to understand.
# git diff
diff --git a/test.ipynb b/test.ipynb
index f6c1f17..5af6074 100644
--- a/test.ipynb
+++ b/test.ipynb
@@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
@@ -21,7 +21,7 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
@@ -30,12 +30,164 @@
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(data_dir + \"train.csv\")"
]
+ },
+ {
+ "cell_type": "code",
:
If you check the diff using nbdime on JupyterLab, it will be as follows. The left side of pink is before the change, and the right side of green is after the change.
I think that the difference is displayed very easily.
Recommended Posts