[1] Watchdog timers are widely used in computers to facilitate automatic correction of temporary hardware faults, and to prevent errant or malevolent software from disrupting system operation.
During normal operation, the computer regularly restarts the watchdog timer to prevent it from elapsing, or "timing out".
If, due to a hardware fault or program error, the computer fails to restart the watchdog, the timer will elapse and generate a timeout signal.
For example, remote embedded systems such as space probes are not physically accessible to human operators; these could become permanently disabled if they were unable to autonomously recover from faults.
Alternatively, some tightly coupled[b] watchdog timers are kicked by executing a special machine language instruction.
In computers that are running operating systems, electronic watchdog restarts are usually invoked through a device driver.
[5] The device driver, which serves to abstract the watchdog hardware from user space programs, may also be used to configure the time-out period and start and stop the timer.
When activated, the fail-safe circuitry forces all control outputs to safe states (e.g., turns off motors, heaters, and high-voltages) to prevent injuries and equipment damage while the fault persists.
In such cases, a second timer—which is started when the first timer elapses—is typically used to reset the computer later, after allowing sufficient time for data recording to complete.
If the computer fails to kick Stage1 (e.g., due to a hardware fault or programming error), Stage1 will eventually timeout.
This event will start the Stage2 timer and, simultaneously, notify the computer (by means of a non-maskable interrupt) that a reset is imminent.
A watchdog timer provides automatic detection of catastrophic malfunctions that prevent the computer from kicking it.
To detect less severe faults, the daemon[8] can perform tests that cover various aspects of the system condition, including resource availability (e.g., memory, file handles, CPU time), evidence of expected process activity (e.g., system daemons running, specific files being present or updated), overheating, and network activity.
[9] Upon discovery of a failed test, the computer may attempt to perform a sequence of corrective actions under software control, culminating with a software-initiated reboot.
It is essential, however, to have the insurance provided by a hardware WDT, to allow for the case in which a fault causes the daemon itself to malfunction, and thus become unable to invoke a reboot.
Some electronic WDTs (e.g., Analog Devices MAX6324) use linear timing circuits that operate without a digital clock signal.
For example, Texas Instruments' TMS470 microcontroller has an analog WDT that employs an external capacitor and resistor to program the watchdog interval.
For example, in the analog watchdog circuit shown to the right, electric current i gradually charges capacitor C, causing voltage VC to ramp up (rise at a constant rate).
Examples of these include "Softdog", a virtual device driver which emulates an electronic WDT and conforms to the Linux watchdog API,[11] and MathWorks' Software Watchdog Timer, a retriggerable one-shot timer which can be instantiated by dragging its GUI representation onto a block diagram.
For example, in bare metal applications (program running without an OS), timing references are often limited to programmable interval timers (PIT).
The associated interrupt service routine (ISR) will then execute and take corrective action via programmed I/O, system calls, or other software-controlled operations.
wdctl
, a program that shows watchdog status