Heartbeat (computing)

[1][2] Heartbeat mechanism is one of the common techniques in mission critical systems for providing high availability and fault tolerance of network services by detecting the network or systems failures of nodes or daemons which belongs to a network cluster—administered by a master server—for the purpose of automatic adaptation and rebalancing of the system by using the remaining redundant nodes on the cluster to take over the load of failed nodes for providing constant services.

A heartbeat protocol is generally used to negotiate and monitor the availability of a resource, such as a floating IP address, and the procedure involves sending network packets to all the nodes in the cluster to verify its reachability.

For this reason, it is often desirable to have a heartbeat running over more than one transport; for instance, an Ethernet segment using UDP/IP, and a serial link.

[6] Since CMs have transactions across the cluster, the most common pattern is to send heartbeat messages to all the nodes and "await" responses in non-blocking fashion.

[9] Every CM on the master server maintains a finite-state machine with three states for each node it administers: Down, Init, and Alive.

[12] In this communications protocol every node sends back a message in a given interval, say delta, in effect confirming that it is alive and has a heartbeat.

If delta is too small, it requires too much overhead and if it is large it results in performance degradation as everything waits for the next heartbeat signal.