In network management, fault management is the set of functions that detect, isolate, and correct malfunctions in a telecommunications network, compensate for environmental changes, and include maintaining and examining error logs, accepting and acting on error detection notifications, tracing and identifying faults, carrying out sequences of diagnostics tests, correcting faults, reporting error conditions, and localizing and tracing faults by examining and manipulating database information.
Note that the latest version of the syslog protocol draft under development within the IETF includes a mapping between these two different sets of severities.
Some notification systems also have escalation rules that will notify a chain of individuals based on availability and severity of alarm.
However, if the device being monitored fails completely or locks up, it won't throw an alarm and the problem will not be detected.
Fault management includes any tools or procedure for testing, diagnosing or repairing the network when a failure occurs.