The Linux Watchdog driver API

https://www.kernel.org/doc/html/latest/watchdog/watchdog-api.html


Docs » Linux Watchdog Support » The Linux Watchdog driver API

The Linux Watchdog driver API

Last reviewed: 10/05/2007

Copyright 2002 Christer Weingel [email protected]

Some parts of this document are copied verbatim from the sbc60xxwdt driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>

A part of this document is copied as it is from the sbc60xx wdt driver of (c) Copyright 2000 Jakob Oestergaard [email protected].

This document describes the state of the Linux 2.4.18 kernel.

This document is written with the Linux 2.4.18 kernel.

Introduction

A Watchdog Timer (WDT) is a hardware circuit that can reset the computer system in case of a software fault. You probably knew that already.

The Watchdog Timer (WDT) is a hardware circuit that can reset your computer system in the event of a software problem. You may already know this.

Usually a userspace daemon will notify the kernel watchdog driver via the /dev/watchdog special device file that userspace is still alive, at regular intervals. When such a notification occurs, the driver will usually tell the hardware watchdog that everything is in order, and that the watchdog should wait for yet another little while to reset the system. If userspace fails (RAM error, kernel bug, whatever), the notifications cease to occur, and the hardware watchdog will reset the system (causing a reboot) after the timeout occurs.

Normally, the user space daemon periodically notifies the kernel wachdog driver that it is still alive via the / dev / watchdog special device file that is alive in the user space. When such a notification occurs, the driver usually notifies the hardware watchdog that everything is fine and the watchdog still needs to wait a little longer to reset the system. If there is a user space problem (RAM trouble, kernel bug, etc.), history will no longer occur and the hardware watchdog will reset the system after a timeout (ie a reboot). Masu).

The Linux watchdog API is a rather ad-hoc construction and different drivers implement different, and sometimes incompatible, parts of it. This file is an attempt to document the existing usage and allow future driver writers to use it as a reference.

The Linux watchdog API is a makeshift structure. Different drivers have different implementations, are incompatible, and implement some of them. This file is an attempt to document existing usage and make it visible to traditional driver authors.

The simplest API

All drivers support the basic mode of operation, where the watchdog activates as soon as /dev/watchdog is opened and will reboot unless the watchdog is pinged within a certain time, this time is called the timeout or margin. The simplest way to ping the watchdog is to write some data to the device. So a very simple watchdog daemon would look like this source file: see samples/watchdog/watchdog-simple.c

All drivers support basic modes of processing. watchdog is activated as soon as / dev / watchdog is opened. Then, when the watchdog is notified for a certain period of time, it will reboot. This time is called timeout or margin. The simplest way to ping the watchdog is to write some data to the device. So a very simple watchdog daemon looks like the following source file: See samples / watchdog / watchdog-simple.c. ,

A more advanced driver could for example check that a HTTP server is still responding before doing the write call to ping the watchdog.

More advanced drivers can verify that the HTTP server is still responding, for example, before making a write call to ping the watchdog.

When the device is closed, the watchdog is disabled, unless the “Magic Close” feature is supported (see below). This is not always such a good idea, since if there is a bug in the watchdog daemon and it crashes the system will not reboot. Because of this, some of the drivers support the configuration option “Disable watchdog shutdown on close”, CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when compiling the kernel, there is no way of disabling the watchdog once it has been started. So, if the watchdog daemon crashes, the system will reboot after the timeout has passed. Watchdog devices also usually support the nowayout module parameter so that this option can be controlled at runtime.

The watchdog is disabled when the device is closed, if the "Magic Close" feature is not supported (see below). This is not always a good idea. If there is a bug in the watchdog daemon and it crashes, the system will not be able to reboot. Therefore, some drivers support the "Diasable watchdog shutdown on close", CONFIG_WATCHDOG_NOWAYOUT configuration option. If this is set to Y when compiling the kernel, there is no way to stop it once watchdog has started. Therefore, if the watchdog daemon crashes, the system will be rebooted after the timeout has elapsed. Watchdog devices also typically support a nowayout module parameter to control this option at run time.

Magic Close feature

If a driver supports “Magic Close”, the driver will not disable the watchdog unless a specific magic character ‘V’ has been sent to /dev/watchdog just before closing the file. If the userspace daemon closes the file without sending this special character, the driver will assume that the daemon (and userspace in general) died, and will stop pinging the watchdog without disabling it first. This will then cause a reboot if the watchdog is not re-opened in sufficient time.

If the driver supports "Magic Close", the driver cannot disable the watchdog specific magic character "V" without sending it to / dev / watchdog before closing the file. When the user space daemon closes the file without sending a special string, the driver will notify the watchdog without the daemon (and generally user space) dying and stopping for the first time. It is considered to have stopped. This will cause a reboot if the watchdog does not reopen in sufficient time.

The ioctl API

All conforming drivers also support an ioctl API.

All compliant drivers also support the ioctl API.

Pinging the watchdog using an ioctl:

Notification to watchdog using ioctl.

All drivers that have an ioctl interface support at least one ioctl, KEEPALIVE. This ioctl does exactly the same thing as a write to the watchdog device, so the main loop in the above program could be replaced with:

Drivers with an ioctl interface support at least the KEEPALIVE interface. This ioctl is the same as writing to the watchdog device, so the main loop in the previous program can be rewritten as follows:

while (1) {
        ioctl(fd, WDIOC_KEEPALIVE, 0);
        sleep(10);
}

the argument to the ioctl is ignored.

The ioctl argument is invalid.

Setting and getting the timeout

For some drivers it is possible to modify the watchdog timeout on the fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT flag set in their option field. The argument is an integer representing the timeout in seconds. The driver returns the real timeout used in the same variable, and this timeout might differ from the requested one due to limitation of the hardware:

Some drivers allow the watchdog timeout to be changed during execution with the SETTIMEOUT ioctl. These drivers have the WDIOF_SETTIMEOUT flag set in the option field. The argument is an integer type and indicates the number of seconds for timeout. The driver will return the actual timeout using the same variable. This timeout may differ from what was requested due to hardware limitations.

int timeout = 45;
ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
printf("The timeout was set to %d seconds\n", timeout);

This example might actually print “The timeout was set to 60 seconds” if the device has a granularity of minutes for its timeout.

In this example, if the device timeout is in minutes, the actual output would be “The timeout was set to 60 seconds”.

Starting with the Linux 2.4.18 kernel, it is possible to query the current timeout using the GETTIMEOUT ioctl:

Starting with the Linux 2.4.18 kernel, you can also request the current timeout with the GETTIMEOUT ioctl.


ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
printf("The timeout was is %d seconds\n", timeout);

Pretimeouts

Some watchdog timers can be set to have a trigger go off before the actual time they will reset the system. This can be done with an NMI, interrupt, or other mechanism. This allows Linux to record useful information (like panic information and kernel coredumps) before it resets:

Some Watcjdog timers allow you to set a trigger before the actual time you restart the system. It can take advantage of NMI, interrupts or other mechanisms. This allows Linux to leave useful information (eg panic information and kernel core dumps) before rebooting.

pretimeout = 10;
ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);

Note that the pretimeout is the number of seconds before the time when the timeout will go off. It is not the number of seconds until the pretimeout. So, for instance, if you set the timeout to 60 seconds and the pretimeout to 10 seconds, the pretimeout will go off in 50 seconds. Setting a pretimeout to zero disables it.

Note that pretimeout is the time in seconds before timeout is triggered. This is not a letter k until pretimeout. So, for example, if you set timeout to 60 seconds and pretimeout to 10 seconds, pretimeout will be activated in 50 seconds. It can be disabled by setting pretimeout to 0.

There is also a get function for getting the pretimeout:

There is also a function to get the pretimeout.

ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
printf("The pretimeout was is %d seconds\n", timeout);

Not all watchdog drivers will support a pretimeout.

Not all Watchdog drivers support pretimeout.

Get the number of seconds before reboot

Some watchdog drivers have the ability to report the remaining time before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl that returns the number of seconds before reboot:

Some Watchdog drivers have the ability to notify you of the time remaining before the system restarts. WDIOC_GETTIMELEFT is an ioctl that returns the number of seconds before a reboot.

ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
printf("The timeout was is %d seconds\n", timeleft);

Environmental monitoring

All watchdog drivers are required return more information about the system, some do temperature, fan and power level monitoring, some can tell you the reason for the last reboot of the system. The GETSUPPORT ioctl is available to ask what the device can do:

All watchdog drivers need to return a lot of detailed information about the system. Some monitor temperature, fan, and power levels. Some will notify you as the last factor that caused the system to reboot. GETSUPPORT ioctl can inquire what the device can do.

struct watchdog_info ident;
ioctl(fd, WDIOC_GETSUPPORT, &ident);

the fields returned in the ident struct are:

The field of the ident structure is as follows.

identity a string identifying the watchdog driver firmware_version the firmware version of the card if available options a flags describing what the device supports

ʻIdentitywatchdog driver identifier string firmware_version If enabled, the firmware version of the card ʻOptions Flag descriptors supported by the device.

the options field can have the following bits set, and describes what kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can return. [FIXME – Is this correct?]

In the option field, you can compete for the following bits and return information via GET_STATUS or GET_BOOT_STATUS ioctl [FIXME-Is this true? ]

WDIOF_OVERHEAT Reset due to CPU overheat The machine was last rebooted by the watchdog because the thermal limit was exceeded:

WDIOF_OVERHEAT Reboot due to CPU overheat The machine has reached the temperature limit and watchdog has made a final reboot.

WDIOF_FANFAULT Fan failed A system fan monitored by the watchdog card has failed

WDIOF_FANFAULT Fan failed This system has failed the fan being monitored by the watchdog card.

WDIOF_EXTERN1 External relay 1 External monitoring relay/source 1 was triggered. Controllers intended for real world applications include external monitoring pins that will trigger a reset.

WDIO F_EXTERN1 External factor 1 An external monitoring factor / source 1 trigger has occurred. The controller for real-world applications contains a pin to trigger a reset.

WDIOF_EXTERN2 External relay 2 External monitoring relay/source 2 was triggered

WDIOF_EXTERN2 External relay 2 An external monitoring factor / source 1 trigger has occurred.

WDIOF_POWERUNDER Power bad/power fault The machine is showing an undervoltage status

WDIO F_POWERUNDER Power failure / power failure A low voltage condition was seen on this machine.

WDIOF_CARDRESET Card previously reset the CPU The last reboot was caused by the watchdog card

WDIOF_CARDRESET Card previously reset the CPU The watchdog card caused the final reboot.

WDIOF_POWEROVER Power over voltage The machine is showing an overvoltage status. Note that if one level is under and one over both bits will be set - this may seem odd but makes sense.

WDIOF_POWEROVER Overcurrent power supply

This machine has detected an overcurrent condition. Note that this may seem strange if one level goes down and one level is set to both bits, but it makes sense.

WDIOF_KEEPALIVEPING Keep alive ping reply The watchdog saw a keepalive ping since it was last queried.

WDIOF_KEEPALIVEPING Keep alive ping reply watchdog has detected a keepalive notification from the last request.

WDIOF_SETTIMEOUT Can set/get the timeout The watchdog can do pretimeouts.

WDIOF_SETTIMEOUT Can set/get the timeout Watchdog can perform pretimeout.

WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set For those drivers that return any bits set in the option field, the GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current status, and the status at the last reboot, respectively:

WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set Drivers that return bits set in the option field can use GETSTATUS and GETBOOTSTATUS ioctls to check the current state and the state at the time of the last reboot. As below…

int flags;
ioctl(fd, WDIOC_GETSTATUS, &flags);

or

ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);

Note that not all devices support these two calls, and some only support the GETBOOTSTATUS call.

Note that not all drivers support these two calls, and some only support the GETBOOTSTATUS call.

Some drivers can measure the temperature using the GETTEMP ioctl. The returned value is the temperature in degrees fahrenheit:

Some drivers can use GETTEMP icotl to measure temperature. The return value is the temperature of fahrenheit degrees.


int temperature;
ioctl(fd, WDIOC_GETTEMP, &temperature);

Finally the SETOPTIONS ioctl can be used to control some aspects of the cards operation:

Finally, SETOPTION ioctl can give instructions on several card operation aspects.

int options = 0;
ioctl(fd, WDIOC_SETOPTIONS, &options);

The following options are available:

The following options are valid.

WDIOS_DISABLECARD Turn off the watchdog timer WDIOS_ENABLECARD Turn on the watchdog timer WDIOS_TEMPPANIC Kernel panic on temperature trip

WDIOS_DISABLECARD Stop the watchdog timer WDIOS_ENABLECARD Start the witch dock timer WDIOS_TEMPPANIC Causes kernel panic due to temperature factors

[FIXME – better explanations]


Originally, it is a part of the Linux Kernel source code, so it will be treated as GPLv2 (recognition that it should be).

https://www.kernel.org/doc/html/latest/index.html

Licensing documentation

The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text.

https://www.kernel.org/doc/html/latest/process/license-rules.html#kernel-licensing

Recommended Posts

The Linux Watchdog driver API
Introducing PlantUML while reading the Linux driver
Note calling the CUDA Driver API with Python ctypes
What is the Linux kernel?
Try using the Twitter API
[Linux] Update the package offline
Install the JDK on Linux
Understand the Linux audit system Audit
Linux FD event API summary
Try using the Twitter API
Try using the PeeringDB 2.0 API
[Linux] Directory under the root
Call the API with python3.
Paste the link on linux
Linux Gadget Serial Driver v2.0
How to get the printer driver for Oki Mac into Linux
[Linux ☓ Nvidia] Curve Control the fan
Hit the Sesami API in Python
[Python] Hit the Google Translation API
Try the Linux kernel lockdown mechanism
[linux] kill command to kill the process
AlterLinux-About the fully Japanese Linux distribution
Hit the web API in Python
Use the Flickr API from Python
I tried the Naruro novel API
[Linux] Who is the background job! ??
Access the Twitter API in Python
I tried using the checkio API