[PYTHON] [MS Azure] Slack notification of competition information using Azure Functions and Kaggle API

Overview

When doing Kaggle, you may want to catch discussions and Notebook information as quickly as possible. I think that some people are using Microsft Azure as a competition and analysis environment, so this time I will summarize how to notify Slack of Kaggle competition information using Azure Functions and Kaggle API which are FaaS of Microsft Azure. I did. kaggle0.png

Premise

--Since the focus is on how to connect and integrate Azure Functions, Kaggle API, and Slack, the details of setting and development of each service are omitted in some cases. --It is assumed that you already have an account for Kaggle, Slack, and Microsoft Azure.

procedure

  1. Slack Add the ʻIncoming Webhook` app in Slack and set the channel you want to send notifications to. (Detailed steps are omitted.) Once set, remember the URL for the webhook. Notify this URL in Azure Functions. image.png

  2. Kaggle(API) The Kaggle API has detailed specifications on the Kaggle site, but it is usually installed with pip → issued from the command line, so with Azure Functions It's a little difficult to use quickly. Therefore, this time (though it seems that it is not officially released), we will respond by hitting the Web API that the Kaggle API tool is hitting internally directly from Azure Functions.

If you read Source, the access layer of the API called in the tool is created by the client automatic generation function of OpenAPI (Swagger). You can check the specifications of Web API by referring to KaggleSwagger.yaml directly under the project with Swagger Editor. kaggleapiswagger.png

When hitting the command line based API, kaggle.json is issued from the Kaggle site and placed, but the Web API encodes the key in this and puts" Basic *** in the HTTP ʻAuthenticationheader. It is used by embedding it in the format of "* ***". I won't go into details, but the implementation around that is inkaggle / configuration.py`.

From the API, this time I will use the API (kernels / list) to get the list of notebooks released as a sample. Search condition parameters are given in the query string. Check the Swagger specifications for details. image.png

  1. Azure

Creating a new Azure Functions

Create a new Azure Functions from the Azure console. This article omits detailed authority settings, so please be careful when using it in a company or for actual services.

Set the function name etc. according to the input contents of the tab. The function name is "kagglenotify". Select Python 3.7 for the language, ʻEast Asia for the region, and Consumption (serverless) for the plan type. Also, if you select Python as the language, the OS will be Linux` by default? Other detailed settings are omitted.

If you press "Create" after the confirmation screen, Azure Functions as an Azure object will be created. Next, I will write the processing of the function concretely.

program

Select the created kagglenotify function and select "Function" from the left menu. With the Linux pay-as-you-go plan, it seems that you can't write code directly on the console at the moment, so it's a bit annoying, but this time I developed it locally.

For local development, VS Code + Azure plugin seems to be a standard, so [Official procedure](https://docs.microsoft.com/ja-jp/azure/azure-functions/functions-create-first-function-vs- Install the required tools on your PC according to code? pivots = programming-language-python).

Continue to follow the Official Procedures locally Create new projects and functions. I made it "kaggle-notify-function". If you select Time Triger as the trigger, the template source code to which the template of the time activation function is applied will be generated.

We will write the process in the main function of __init__.py. This time, using the urllib.request package, (1) acquired the kernel information of the competition that hit the Kaggle API, and (2) implemented the processing of Slack notification. Please note that the Slack webhook URL and Kaggle API access token in the sample code are your own. In the sample, List of notebooks of the competition called OSIC Pulmonary Fibrosis Progression should be created and then 20 new ones should be acquired. And inform Slack of the creator and title of the Notebook.

__init__.py


import datetime
import logging
import json
import urllib.request
import urllib.parse
import azure.functions as func

def main(mytimer: func.TimerRequest) -> None:

    slack_webhook_url = '★★ Slack webhook URL ★★'
    kaggle_base_url = 'https://www.kaggle.com/api/v1'
    kaggle_kernellist_path = '/kernels/list'
    kaggle_basic_auth_token = 'Basic ★★ Token for your Kaggle API access ★★'

    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    kernellist_params = {
        'page':1,
        'pageSize':20,
        'search':'',
        'group':'everyone',
        'user': '',
        'language':'all',
        'kernelType':'all',
        'outputType':'all',
        'sortBy':'Recently Created',
        'dataset':'',
        'competition':'osic-pulmonary-fibrosis-progression',
        'parentKernel':''

    }
    request = urllib.request.Request(
        url = kaggle_base_url + kaggle_kernellist_path + '?' + urllib.parse.urlencode(kernellist_params), 
        method='GET',
        headers={ 
            'Accept': 'application/json',
            'Authorization': kaggle_basic_auth_token
         } 
    )
    text = "\n"
    with urllib.request.urlopen(request) as response:
        response_body = response.read().decode("utf-8")
        response_body_json = json.loads(response_body)
        for i in range(len(response_body_json)):
            text += response_body_json[i]["author"] + "  " + response_body_json[i]["title"] + "\n"

    request = urllib.request.Request(
        url=slack_webhook_url, 
        data=json.dumps({
            'text':text
        }).encode("utf-8"), 
        method='POST',
        headers={ 'Content-Type': 'application/json; charset=utf-8' } 
    )
    urllib.request.urlopen(request)

The timer settings are described in function.json. The following is the setting to start the function every hour at 0 minutes.

function.json


{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "mytimer",
      "type": "timerTrigger",
      "direction": "in",
      "schedule": "0 0 */1 * * *"
    }
  ]
}

Deploy the last function you created from the Azure plugin to Azure Functions. For the deployment destination, specify the "kaggle notify" created earlier.

When the deployment is completed, you can check "kaggle-notify-function" from the Azure console. It runs at 0 minutes every hour, but if you want to move it immediately and check it, you can run the test from the code and test menu. kaggle6.png

result

If all goes well, the Slack channel will be notified of the Notebook information retrieved by the Kaggle API. image.png

Summary and impressions

It is a sample that simply acquires the latest 20 cases every time, but by saving the acquisition result in Cosmos DB etc. and extracting a new difference every time, a mechanism to notify Slack only when there is an update can be built. I think it is possible. Also, this time I tried Azure Functions for the first time. I didn't make an exhaustive comparison, but I didn't feel that it was too difficult to develop compared to AWS Lambda, which I usually use. If you usually use Azure as a cloud environment, I thought that it would be perfectly possible to use Azure Functions for such a light mechanism.

Recommended Posts

[MS Azure] Slack notification of competition information using Azure Functions and Kaggle API
Slack notification of weather information on OpenWhisk
Get all songs of Arashi's song information using Spotify API and verify the index
[Azure] Try using Azure Functions
Web scraping of comedy program information and notification on LINE
Get data using Ministry of Internal Affairs and Communications API
Streamline information gathering with the Twitter API and Slack bots
Automation of a research on geographical information such as store network using Python and Web API
[Notes / Updated from time to time] This and that of Azure Functions
[Python] LINE notification of the latest information using Twitter automatic search
Collect product information and process data using Rakuten product search API [Python]
[Ruby on Rails] Display and pinning of GoolgeMAP using Google API