[PYTHON] I tried papermill

Introduction

"Platform that can deploy machine learning as an actual service only with Jupyter", and jupyter as the foundation I was very impressed with how it was possible to incorporate the notebook directly. In the article, a library called "papermill" that can execute jupyter notebook from the outside appeared, so I would like to use this.

reference

environment

procedure

Installation

pip install papermill

Run

The folder structure is as follows. In the notebook that ʻinput.ipynb wants to execute, write the execution code in main.py`.

work/
 ├ main.py
 └ input.ipynb

The contents of ʻinput.ipynb` are very simple as follows.

image.png

Here, the first cell is tagged with parameters. Select View-Cell Toolbar --Tags from the menu to display the text box on the upper right of the cell. Enter parameters here and click ʻAdd tagto add the tag. papermill can go to the cell with theparameters` tag in the notebook and rewrite the variables in the cell.

To run it in the python API: The notebook after execution is output as ./output.ipynb.

main.py


import papermill as pm

pm.execute_notebook(
   './input.ipynb',
   './output.ipynb',
   parameters = dict(alpha=0.6, ratio=0.1)
)

Run it with python main.py.

$ python main.py 
Executing: 100%|████████████████████████████████| 3/3 [00:01<00:00,  1.80cell/s]

When I open ./output.ipynb, it looks like this: A cell tagged with ʻInjected-parameters` has been added, overwriting the parameters.

image.png

To run from the CLI: Papermill will judge boolean and numerical value without permission.

$ papermill ./input.ipynb ./output.ipynb  -p alpha 0.6 -p ratio 0.1
Input Notebook:  ./input.ipynb
Output Notebook: ./output.ipynb
Executing: 100%|████████████████████████████████| 3/3 [00:01<00:00,  2.67cell/s]

The parameters can also be specified in the yaml file.

work/
 ├ main.py
 ├ input.ipynb
 └ parameters.yaml

In the CLI do the following:

papermill ./input.ipynb ./output.ipynb -f ./parameters.yaml

You can also save it to cloud storage. In that case, you also need to install the option.

pip install papermill[all]

Change the ./output.ipynb part to the cloud destination. Below is an example of AWS S3. It can be executed if it is configured by CLI.

papermill ./input.ipynb s3://xxxxxxxxxx/output.ipynb -f ./parameters.yaml

bonus

If the output destination is the same as the input, it will be overwritten.

papermill ./input.ipynb ./input.ipynb -f ./parameters.yaml

Repeating multiple times will only overwrite the cells in ʻInjected-parameters`, so the parameters will be rewritten properly.

in conclusion

It seems interesting to be able to create a management screen with flask and manage learning.

Recommended Posts

I tried papermill
I tried scraping
I tried PyQ
I tried AutoKeras
I tried django-slack
I tried Django
I tried spleeter
I tried cgo
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried competitive programming
I tried running pymc
I tried ARP spoofing
I tried using aiomysql
I tried using Summpy
I tried Python> autopep8
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried PyCaret2.0 (pycaret-nightly)
I tried using openpyxl
I tried deep learning
I tried AWS CDK!
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried Kivy's mapview
I tried using ngrok
I tried using face_recognition
I tried to paste
I tried using Jupyter
I tried using PyCaret
I tried moving EfficientDet
I tried shell programming
I tried using Heapq
I tried using doctest
I tried Python> decorator
I tried running TensorFlow
I tried Auto Gluon
I tried using folium
I tried using jinja2
I tried AWS Iot
I tried Bayesian optimization!
I tried using folium
I tried using time-window
I tried fp-growth with python
I tried AutoGluon's Image Classification
I tried to learn PredNet
I tried Learning-to-Rank with Elasticsearch!
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried to organize SVM.
I tried face recognition using Face ++
I tried clustering with PyCaret
I tried using BigQuery ML
I tried "K-Fold Target Encoding"