TL; DR
--You can now monitor Air's sim session status with Prometheus by hitting SORACOM API. --Implemented so that the result of API hit can be obtained from Prometheus via the text collector of node_exporter. --It's not the story of SORACOM, but rather the story of Prometheus exporter made with node exporter text collector ...
Of course, if you are operating something IoT with SORACOM Air's sim, you also need to monitor it. Then, depending on the implementation, health information will be sent from the OS or application inside something that is IoT. However, if the NW is not communicated, such information cannot be sent, and it is necessary to monitor from the viewpoint of whether the NW is alive or can be communicated.
So, when there is a problem, it would be nice to know what the cause is layering in the Air sim session. Very happy. For example, if application communication in IoT is not possible but session status is Online, there seems to be a problem inside the OS or application inside something that is IoT. With Offline, there is a high possibility that there is a problem on the outside, such as whether the device is out of electricity, the radio wave condition, or the antenna is not broken.
That's why, when using Prometheus as a monitoring environment, I wanted to pick up Air information from SORACOM API, put it in Prometheus, skip it to slack, or visualize it with grafana.
--soracom_exporter.py (created with python, described later) --Striking the SORACOM API --Save metrics as text under / hoge / node_exporter / text_collector --This is a Prometheus python client with functions prepared in advance, so you can throw the whole process. --Update ↑ metrics every minute when resident process is started --node_exporter (insert in advance) --Enable the loading of ↑ text collector in the startup option (described later) --The metrics under text_collector will be returned to Prometheus along with the OS metrics. --Prometheus (Put in advance) --insert node_exporter into scrape job --If you scrape node_exporter, you can get soracom_exporter generated metrics along with OS metrics.
So, for example, the following file structure.
/hoge
|-- prometheus
| |-- prometheus(Binary body)
| |-- prometheus.yml
| |-- (various)
|-- node_exporter(Binary body)
| |-- node_exporter
| |-- text_collector
| |-- soracom_exporter_session_status.prom(Updated every time)
|-- soracom_exporter
| |-- soracom_exporter.py
soracom_exporter.py
--Details are explained in the comments --Resident start with supervisord etc. --The general flow is written in ʻexport_session_status_metrics`
soracom_exporter.py
import json
import logging
import time
import requests
logging.basicConfig(level=logging.DEBUG, format="%(asctime)-15s %(message)s")
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
#To make it a resident process and execute a timer
# pip install schedule
# cf. https://schedule.readthedocs.io/en/stable/
import schedule
# Prometheus Python Client
# cf. https://github.com/prometheus/client_python
from prometheus_client import CollectorRegistry, Gauge, write_to_textfile # pip install prometheus_client
class SORACOMExporter():
def __init__(self):
# config for api
self.SORACOM_API_KEY_ID = "keyId-xxx" #Change to your own
self.SORACOM_API_KEY_SECRET = "secret-xxx" #Change to your own
self.SORACOM_URL_AUTH = "https://api.soracom.io/v1/auth"
self.SORACOM_URL_SUBSCRIBERS = "https://api.soracom.io/v1/subscribers?limit=1000"
def export_session_status_metrics(self):
# api key id/Generate token from secret(You really should use it correctly...)
self._get_soracom_api_token()
#Get air sim list from api and parse
# cf. https://dev.soracom.io/jp/docs/api/#!/Subscriber/listSubscribers
self.subscribers = self._get_subscribers()
#Process it into Prometheus metrics-like data and write it to a file
registry = CollectorRegistry()
self._build_soracom_session_status_metrics(registry, self.subscribers)
self._write_metrics(registry)
def _build_soracom_session_status_metrics(self, registry, subscribers):
#Here the names and labels of metrics/Define structure such as value
soracom_session_status_gauge = Gauge(
"soracom_session_status", # metrics name
"SORACOM session status", # metrics description
["imsi", "name"], # labels
registry=registry
)
#Put the data that you got from the API
for subscriber in subscribers:
metrics_value = 1.0 if subscriber["session_status"] else 0.0 #Online 1.0, 0 for Offline.0
soracom_session_status_gauge.labels(
subscriber["imsi"],
subscriber["name"]
).set(metrics_value)
def _write_metrics(self, registry):
#Here and there, I'm just using what is prepared as described in the README of Prometheus's python client.
# cf. https://github.com/prometheus/client_python
text_collector_output_path = "/hoge/node_exporter/text_collector/soracom_exporter_session_status.prom"
write_to_textfile(text_collector_output_path, registry)
logging.info("text metrics was written!:%s" % text_collector_output_path)
def _get_subscribers(self):
subscribers_json = self._get_soracom_api_json(self.SORACOM_URL_SUBSCRIBERS)
# parse subscribers json to extract every subscribers's imsi/tag.Name/sessionStatus
subscribers = []
for subscriber_json in subscribers_json:
subscribers.append({
"imsi": subscriber_json["imsi"],
"name": subscriber_json["tags"]["name"] if "name" in subscriber_json["tags"] else "",
"session_status": subscriber_json["sessionStatus"]["online"] if subscriber_json[
"sessionStatus"] else False
})
return subscribers
def _get_api_headers(self):
api_headers = {
"X-Soracom-API-Key": self.auth_api_key,
"X-Soracom-Token": self.auth_token,
"Accept": "application/json",
}
return api_headers
def _get_soracom_api_token(self):
try:
auth_headers = {"Content-Type": "application/json"}
auth_payload = {"authKeyId": self.SORACOM_API_KEY_ID, "authKey": self.SORACOM_API_KEY_SECRET}
auth_response = requests.post(
self.SORACOM_URL_AUTH,
headers=auth_headers,
data=json.dumps(auth_payload),
verify=True,
timeout=60
)
auth_response.raise_for_status()
except requests.exceptions.RequestException as err:
logging.warning(err)
self.auth_token = auth_response.json()["token"]
self.auth_api_key = auth_response.json()["apiKey"]
def _get_soracom_api_json(self, soracom_api_url):
try:
soracom_response = requests.get(
soracom_api_url,
headers=self._get_api_headers(),
verify=True,
timeout=60
)
soracom_response.raise_for_status()
except requests.exceptions.RequestException as err:
logging.warning(err)
return soracom_response.json()
if __name__ == "__main__":
se = SORACOMExporter()
schedule.every(1).minutes.do(se.export_session_status_metrics) #Run every minute
#Export if you want to take other metrics_hoge_Define metrics and run with the appropriate interval
while True:
schedule.run_pending()
time.sleep(1)
The output file looks like this
$ cat soracom_exporter_session_status.prom
# HELP soracom_session_status SORACOM session status
# TYPE soracom_session_status gauge
soracom_session_status{imsi="00000000000",name="For company verification"} 1.0
soracom_session_status{imsi="11111111111",name="For home verification"} 0.0
...
--This is also a resident start with supervisord etc.
node_exporter -web.listen-address ":9100" -collector.textfile.directory /hoge/node_exporter/text_collector/
#Please note that the version may be old
――What can you do from now on? --Notification to slack --Visualization with grafana ――What other end points can you monitor?
--Why implement node_exporter text collector instead of Prometheus custom exporter?
--Asynchronous metrics information can be prepared
--SORACOM API doesn't have so many endpoints that you can hit with bulk, so if you have a lot of SIM, you will want to make 1 + N API Call. Then, in order to be scraped from Prometheus and reply immediately, I have to do a lot of API Calls at once, so I want to process it asynchronously. became. sorry.
--Easy to adjust the frequency of hitting SORACOM API
--I want to know the session status every minute, but as mentioned above, for example, if you want to get the traffic for each SIM, [GET / stats / air / subscribers / {imsi}](https://dev.soracom. io / jp / docs / api / #! / Stats / getAirStats), the information is updated once every 5 minutes, and it is stupid to hit it every minute. So, I wanted an implementation that could adjust the interval. If you put more in push gateway or memory, you can do it with custom exporter.
--Easy to implement on a small scale
--Text collector for small and quick work, custom exporter for complicated things
-(Digression) Is soracom_exporter okay?
--AWS Cloudwatch exporter is one of Prometheus's exporter-like ones that hits external APIs. But I'm importing instead of export, but is it okay to use exporter? A mystery. Since node exporter is placed in the node to be monitored, it will be an exporter ...
--Furthermore, I'm not sure how to call the one that outputs text for the text collector of node exporter.
――Even if I googled with soracom_exporter
, it didn't come out yet, so I felt like I wanted to use it even though it wasn't so exhaustive.
end
Recommended Posts