[PYTHON] The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)

TL; DR

--You can now monitor Air's sim session status with Prometheus by hitting SORACOM API. --Implemented so that the result of API hit can be obtained from Prometheus via the text collector of node_exporter. --It's not the story of SORACOM, but rather the story of Prometheus exporter made with node exporter text collector ...

Introduction

Of course, if you are operating something IoT with SORACOM Air's sim, you also need to monitor it. Then, depending on the implementation, health information will be sent from the OS or application inside something that is IoT. However, if the NW is not communicated, such information cannot be sent, and it is necessary to monitor from the viewpoint of whether the NW is alive or can be communicated.

So, when there is a problem, it would be nice to know what the cause is layering in the Air sim session. Very happy. For example, if application communication in IoT is not possible but session status is Online, there seems to be a problem inside the OS or application inside something that is IoT. With Offline, there is a high possibility that there is a problem on the outside, such as whether the device is out of electricity, the radio wave condition, or the antenna is not broken.

That's why, when using Prometheus as a monitoring environment, I wanted to pick up Air information from SORACOM API, put it in Prometheus, skip it to slack, or visualize it with grafana.

Implementation

--soracom_exporter.py (created with python, described later) --Striking the SORACOM API --Save metrics as text under / hoge / node_exporter / text_collector --This is a Prometheus python client with functions prepared in advance, so you can throw the whole process. --Update ↑ metrics every minute when resident process is started --node_exporter (insert in advance) --Enable the loading of ↑ text collector in the startup option (described later) --The metrics under text_collector will be returned to Prometheus along with the OS metrics. --Prometheus (Put in advance) --insert node_exporter into scrape job --If you scrape node_exporter, you can get soracom_exporter generated metrics along with OS metrics.

So, for example, the following file structure.


/hoge
|-- prometheus
|   |-- prometheus(Binary body)
|   |-- prometheus.yml
|   |-- (various)
|-- node_exporter(Binary body)
|   |-- node_exporter
|   |-- text_collector
|       |-- soracom_exporter_session_status.prom(Updated every time)
|-- soracom_exporter
|   |-- soracom_exporter.py

soracom_exporter.py

--Details are explained in the comments --Resident start with supervisord etc. --The general flow is written in ʻexport_session_status_metrics`

soracom_exporter.py


import json
import logging
import time
import requests
logging.basicConfig(level=logging.DEBUG, format="%(asctime)-15s %(message)s")
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

#To make it a resident process and execute a timer
# pip install schedule
# cf. https://schedule.readthedocs.io/en/stable/
import schedule

# Prometheus Python Client
# cf. https://github.com/prometheus/client_python
from prometheus_client import CollectorRegistry, Gauge, write_to_textfile # pip install prometheus_client


class SORACOMExporter():
    def __init__(self):
        # config for api
        self.SORACOM_API_KEY_ID = "keyId-xxx"       #Change to your own
        self.SORACOM_API_KEY_SECRET = "secret-xxx"  #Change to your own
        self.SORACOM_URL_AUTH = "https://api.soracom.io/v1/auth"
        self.SORACOM_URL_SUBSCRIBERS = "https://api.soracom.io/v1/subscribers?limit=1000"

    def export_session_status_metrics(self):
        # api key id/Generate token from secret(You really should use it correctly...)
        self._get_soracom_api_token()

        #Get air sim list from api and parse
        # cf. https://dev.soracom.io/jp/docs/api/#!/Subscriber/listSubscribers
        self.subscribers = self._get_subscribers()

        #Process it into Prometheus metrics-like data and write it to a file
        registry = CollectorRegistry()
        self._build_soracom_session_status_metrics(registry, self.subscribers)
        self._write_metrics(registry)

    def _build_soracom_session_status_metrics(self, registry, subscribers):
        #Here the names and labels of metrics/Define structure such as value
        soracom_session_status_gauge = Gauge(
            "soracom_session_status",  # metrics name
            "SORACOM session status",  # metrics description
            ["imsi", "name"],  # labels
            registry=registry
        )

        #Put the data that you got from the API
        for subscriber in subscribers:
            metrics_value = 1.0 if subscriber["session_status"] else 0.0 #Online 1.0, 0 for Offline.0
            soracom_session_status_gauge.labels(
                subscriber["imsi"],
                subscriber["name"]
            ).set(metrics_value)

    def _write_metrics(self, registry):
        #Here and there, I'm just using what is prepared as described in the README of Prometheus's python client.
        # cf. https://github.com/prometheus/client_python
        text_collector_output_path = "/hoge/node_exporter/text_collector/soracom_exporter_session_status.prom"
        write_to_textfile(text_collector_output_path, registry)
        logging.info("text metrics was written!:%s" % text_collector_output_path)

    def _get_subscribers(self):
        subscribers_json = self._get_soracom_api_json(self.SORACOM_URL_SUBSCRIBERS)

        # parse subscribers json to extract every subscribers's imsi/tag.Name/sessionStatus
        subscribers = []
        for subscriber_json in subscribers_json:
            subscribers.append({
                "imsi": subscriber_json["imsi"],
                "name": subscriber_json["tags"]["name"] if "name" in subscriber_json["tags"] else "",
                "session_status": subscriber_json["sessionStatus"]["online"] if subscriber_json[
                    "sessionStatus"] else False
            })

        return subscribers

    def _get_api_headers(self):
        api_headers = {
            "X-Soracom-API-Key": self.auth_api_key,
            "X-Soracom-Token": self.auth_token,
            "Accept": "application/json",
        }
        return api_headers

    def _get_soracom_api_token(self):
        try:
            auth_headers = {"Content-Type": "application/json"}
            auth_payload = {"authKeyId": self.SORACOM_API_KEY_ID, "authKey": self.SORACOM_API_KEY_SECRET}
            auth_response = requests.post(
                self.SORACOM_URL_AUTH,
                headers=auth_headers,
                data=json.dumps(auth_payload),
                verify=True,
                timeout=60
            )
            auth_response.raise_for_status()
        except requests.exceptions.RequestException as err:
            logging.warning(err)
        self.auth_token = auth_response.json()["token"]
        self.auth_api_key = auth_response.json()["apiKey"]

    def _get_soracom_api_json(self, soracom_api_url):
        try:
            soracom_response = requests.get(
                soracom_api_url,
                headers=self._get_api_headers(),
                verify=True,
                timeout=60
            )
            soracom_response.raise_for_status()
        except requests.exceptions.RequestException as err:
            logging.warning(err)
        return soracom_response.json()


if __name__ == "__main__":
    se = SORACOMExporter()
    schedule.every(1).minutes.do(se.export_session_status_metrics) #Run every minute
    #Export if you want to take other metrics_hoge_Define metrics and run with the appropriate interval
    while True:
        schedule.run_pending()
        time.sleep(1)

The output file looks like this

$ cat soracom_exporter_session_status.prom
# HELP soracom_session_status SORACOM session status
# TYPE soracom_session_status gauge
soracom_session_status{imsi="00000000000",name="For company verification"} 1.0
soracom_session_status{imsi="11111111111",name="For home verification"} 0.0
...

node_exporter boot options

--This is also a resident start with supervisord etc.

node_exporter -web.listen-address ":9100" -collector.textfile.directory /hoge/node_exporter/text_collector/
#Please note that the version may be old

Impressions

From now on

――What can you do from now on? --Notification to slack --Visualization with grafana ――What other end points can you monitor?

Implementation

--Why implement node_exporter text collector instead of Prometheus custom exporter? --Asynchronous metrics information can be prepared --SORACOM API doesn't have so many endpoints that you can hit with bulk, so if you have a lot of SIM, you will want to make 1 + N API Call. Then, in order to be scraped from Prometheus and reply immediately, I have to do a lot of API Calls at once, so I want to process it asynchronously. became. sorry. --Easy to adjust the frequency of hitting SORACOM API --I want to know the session status every minute, but as mentioned above, for example, if you want to get the traffic for each SIM, [GET / stats / air / subscribers / {imsi}](https://dev.soracom. io / jp / docs / api / #! / Stats / getAirStats), the information is updated once every 5 minutes, and it is stupid to hit it every minute. So, I wanted an implementation that could adjust the interval. If you put more in push gateway or memory, you can do it with custom exporter. --Easy to implement on a small scale --Text collector for small and quick work, custom exporter for complicated things -(Digression) Is soracom_exporter okay? --AWS Cloudwatch exporter is one of Prometheus's exporter-like ones that hits external APIs. But I'm importing instead of export, but is it okay to use exporter? A mystery. Since node exporter is placed in the node to be monitored, it will be an exporter ... --Furthermore, I'm not sure how to call the one that outputs text for the text collector of node exporter. ――Even if I googled with soracom_exporter, it didn't come out yet, so I felt like I wanted to use it even though it wasn't so exhaustive.

end

Recommended Posts

The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)
I tried to find the entropy of the image with python
I tried to find the average of the sequence with TensorFlow
I tried to expand the size of the logical volume with LVM
I tried to improve the efficiency of daily work with Python
zoom I tried to quantify the degree of excitement of the story at the meeting
I tried to get the authentication code of Qiita API with Python.
I tried to automatically extract the movements of PES players with software
I tried to extract and illustrate the stage of the story using COTOHA
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to save the data with discord
I tried to touch the API of ebay
I tried to automate sushi making with python
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to learn the sin function with chainer
I tried to extract features with SIFT of OpenCV
I tried to summarize the basic form of GPLVM
I tried to touch the CSV file with Python
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to solve the problem with Python Vol.1
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to get the location information of Odakyu Bus
I tried to notify the train delay information with LINE Notify
[Machine learning] I tried to summarize the theory of Adaboost
I tried to fight the Local Minimum of Goldstein-Price Function
I tried to divide the file into folders with Python
The story of making a question box bot with discord.py
I tried to display the point cloud data DB of Shizuoka prefecture with Vue + Leaflet
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I tried to rewrite the WEB server of the normal Linux programming 1st edition with C ++ 14
How to write offline real time I tried to solve the problem of F02 with Python
I tried to visualize the power consumption of my house with Nature Remo E lite
I tried to move ROS (Melodic) with the first Raspberry Pi (Stretch) at the beginning of 2021
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I tried to analyze the data of the soccer FIFA World Cup Russia tournament with soccer action
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to predict the sales of game software with VARISTA by referring to the article of Codexa
I tried to move the ball
I tried to estimate the interval.
I tried to describe the traffic in real time with WebSocket
I tried to solve the ant book beginner's edition with python
[Linux] I tried to summarize the command of resource confirmation system