[PYTHON] Use the Kaggle API inside a Docker container

things to do

--Start container using Kaggle official image (kaggle / python) --Download the csv file with the kaggle ~ command --Download the public notebook with the kaggle ~ command

First summary

  1. DL kaggle.json (kaggle.com> My Account> Create New API Token button)
  2. $ docker run -it --rm --mount type=bind,src=pwd,dst=/root/dev kaggle/python
  3. $ pip install kaggle
  4. $ mkdir ~/.kaggle
  5. $ cp /root/dev/kaggle.json ~/.kaggle
  6. $ chmod 600 ~/.kaggle/kaggle.json
  7. $ kaggle competitions download -c titanic -p input/titanic
  8. $ unzip input/titanic/titanic.zip input/titanic
  9. $ kaggle kernels pull arthurtok/introduction-to-ensembling-stacking-in-python -p ./

The dataset is downloaded from the Titanic competition If you have no problem, you can close it here: relaxed:

environment

macOS 10.14.6 (Mojave) Docker 19.03.4

Why use Docker?

--notebook unstable --I want to write code in VS Code --I don't want to pollute the global environment --Collision between pip and conda --Reliable because there is an official Kaggle image

What is the Kaggle API?

A guy who can do the operations on Kaggle's site from the command line

For example?

--Dataset download --Submit --List of available competitions --Download leaderboard

etc…

For more information, visit Official Repository

Command list


kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init}
kaggle kernels {list, init, push, pull, output, status}
kaggle config {view, set, unset}

I will make such a configuration

kaggle_titanic
├── input
│    └── titanic  <-Win if you can download the csv file here
└── working  <-Win if you can download the ipynb file here

Get Kaggle API Token

From My Account on the upper right スクリーンショット 2019-11-27 9.25.02.png

This is in the middle スクリーンショット 2019-11-27 9.28.42.png

The kaggle.json will be downloaded, so save it in the kaggle_titanic directory. The contents are like this

kaggle.json


{"username":"anata_no_namae","key":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}

By the way

You can invalidate the token here When you press something like "I accidentally pushed kaggle.json to GitHub!" スクリーンショット 2019-11-27 9.41.17.png

Container startup

Official image of Kaggle Use kaggle / python The following command in the kaggle_titanic directory

docker run -it --rm --mount type=bind,src=`pwd`,dst=/root/dev kaggle/python

The explanation of the option is written here, if you like -> [[Explanation with image] Create Anaconda environment with Docker and use VS Code in container](https://qiita.com/komiya_____/items/96c14485eb035701e218#%E3%82%B3%E3%83%B3 % E3% 83% 86% E3% 83% 8A% E8% B5% B7% E5% 8B% 95)

From here on the shell on the container side

Move to the mounted directory

cd /root/dev

OK if the contents are synchronized

ls

input  kaggle.json  working

Install kaggle package

pip install kaggle

Check the version to see if it works

kaggle -v

Traceback (most recent call last):
  File "/opt/conda/bin/kaggle", line 7, in <module>
    from kaggle.cli import main
  File "/opt/conda/lib/python3.6/site-packages/kaggle/__init__.py", line 23, in <module>
    api.authenticate()
  File "/opt/conda/lib/python3.6/site-packages/kaggle/api/kaggle_api_extended.py", line 149, in authenticate
    self.config_file, self.config_dir))
OSError: Could not find kaggle.json. Make sure it's located in /root/.kaggle. Or use the environment method.

I'm so angry You're saying something like kaggle.json can't be found

Place kaggle.json

Create a ~ / .kaggle / directory and copy kaggle.json into it ** * The ~ / .kaggle / directory may have been created at the timing of $ kaggle -v, but this is not a problem **

mkdir ~/.kaggle # <-You may be told that you already have a directory, but don't worry
cp /root/dev/kaggle.json ~/.kaggle

this time

kaggle -v

Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /root/.kaggle/kaggle.json'
Kaggle API 1.5.6

I can use it, but I'm still angry Other users can read the API Key, so change the permissions

Change permissions

chmod 600 ~/.kaggle/kaggle.json 

Following what is

kaggle -v
Kaggle API 1.5.6

Huh

It's annoying to do it every time

The series of processing here may be executed collectively in the cell There is also a strategy to create a file and reuse it

kaggle_settings.ipynb


!pip install kaggle
!mkdir ~/.kaggle
!cp /root/dev/kaggle.json ~/.kaggle
!chmod 600 ~/.kaggle/kaggle.json

Let's download the data

Specify the competition name with -c -p Specify save destination path

kaggle competitions download -c titanic -p input/titanic
ls input/titanic

titanic.zip

titanic.zip has been downloaded

Use the notation on the URL for the competition name

For example, in the Severstal: Steel Defect Detection competition, this is スクリーンショット 2019-11-27 16.38.51.png

Defrost

Unzip the downloaded titanic.zip

unzip input/titanic/titanic.zip -d input/titanic
ls input/titanic

gender_submission.csv  test.csv  titanic.zip  train.csv

There are 3 items, gender_submission.csv, test.csv, and train.csv. Now you can download the dataset

Next, try pulling the notebook

Notation of notebook is here スクリーンショット 2019-11-27 16.41.52.png

kaggle kernels pull arthurtok/introduction-to-ensembling-stacking-in-python -p ./working
ls ./working

introduction-to-ensembling-stacking-in-python.ipynb

Winner because it is pulled properly

It may be convenient to use VS Code together

It is good to be able to write code with intellisense or your own key binding

[Explanation with image] Create Anaconda environment with Docker and use VS Code in container

[Explanation with image] Convert VS Code to Jupyter

For those who want to use jupyter

Added -p 8888: 8888 option when starting container (associate host side and container side ports)

docker run -p 8888:8888 -it --rm --mount type=bind,src=`pwd`,dst=/root/dev kaggle/python

And launch jupyter like this * Explanation of options is [here](https://qiita.com/komiya_____/items/96c14485eb035701e218#dockerfile)
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root --NotebookApp.token=''

Open localhost: 8888 from the address bar of your browser and it's OK

The end

Thank you for reading to the end

Recommended Posts

Use the Kaggle API inside a Docker container
Until you use the Kaggle API with Colab
Carry a Docker container
Try Python interactive mode inside a Docker container
Flutter in Docker-How to build and use a Flutter development environment inside a Docker container
Run Matplotlib on a Docker container
Use the Flickr API from Python
How to delete a Docker container
I made a Docker container to use JUMAN ++, KNP, python (for pyKNP).
Access the Docker Remote API with Requests
A class that hits the DMM API
Run a Python file inside a Docker container on a remote Raspbian via PyCharm
[Blender] Use OpenGL from inside the script
Run matplotlib on a Windows Docker container
Use WebDAV in a Portable Docker environment
Use twitter API to get the number of tweets related to a certain keyword
Created a Python wrapper for the Qiita API
Use the latest pip in a virtualenv environment
Use the MediaWiki API to get Wiki information
How to use the Google Cloud Translation API
Enter into stdin of the running Docker container
Use the e-paper module as a to-do list
Use python in Docker container as Pycharm interpreter
Use Docker development container conveniently with VS Code
Use JIRA API
[Django] Use VS Code + Remote Containers to quickly build a Django container (Docker) development environment.
Use Twitter API to reduce the time taken by Twitter (create a highlighting (like) timeline)
Let's use the API of the official statistics counter (e-Stat)
Let's use the Python version of the Confluence API module.
Create a tweet heatmap with the Google Maps API
Launch Django on a Docker container with docker-compose up
How to post a ticket from the Shogun API
Build a Docker container and save png from altair
[Python] Use the Face API of Microsoft Cognitive Services
Take a closer look at the Kaggle / Titanic tutorial
A little bit from Python using the Jenkins API
Build a lightweight Fast API development environment using Docker