High availability

High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

For instance, in order to carry out their regular daily tasks, hospitals and data centers need their systems to be highly available.

High availability is a property of network resilience, the ability to "provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.

"[3] Threats and challenges for services can range from simple misconfiguration over large scale natural disasters to targeted attacks.

[6] Consequently, recent efforts focus on interpreting and improving network and computing resilience with applications to critical infrastructures.

Examples of unscheduled downtime events include power outages, failed CPU or RAM components (or possibly other failed hardware components), an over-temperature related shutdown, logically or physically severed network connections, security breaches, or various application, middleware, and operating system failures.

Systems that exhibit truly continuous availability are comparatively rare and higher priced, and most have carefully implemented specialty designs that eliminate any single point of failure and allow online hardware, network, operating system, middleware, and application upgrades, patches, and replacements.

The following table shows the downtime that will be allowed for a particular percentage of availability, presuming that the system is required to operate continuously.

The following table shows the translation from a given availability percentage to the corresponding amount of time a system would be unavailable.

A simple mnemonic rule states that 5 nines allows approximately 5 minutes of downtime per year.

For example, electricity that is delivered without interruptions (blackouts, brownouts or surges) 99.999% of the time would have 5 nines reliability, or class five.

In general, the number of nines is not often used by a network engineer when modeling and measuring availability because it is hard to apply in formula.

[citation needed] The use of the "nines" has been called into question, since it does not appropriately reflect that the impact of unavailability varies with its time of occurrence.

[17] For large amounts of 9s, the "unavailability" index (measure of downtime rather than uptime) is easier to handle.

For example, this is why an "unavailability" rather than availability metric is used in hard disk or data link bit error rates.

Similarly, unavailability of select application functions might go unnoticed by administrators yet be devastating to users – a true availability measure is holistic.

A service level agreement ("SLA") formalizes an organization's availability objectives and requirements.

High availability is one of the primary requirements of the control systems in unmanned vehicles and autonomous maritime vessels.

If the controlling system becomes unavailable, the Ground Combat Vehicle (GCV) or ASW Continuous Trail Unmanned Vessel (ACTUV) would be lost.

[21] On the other hand, redundancy is used to create systems with high levels of availability (e.g. popular ecommerce websites).

Passive redundancy is used to achieve high availability by including enough excess capacity in the design to accommodate a performance decline.

Malfunction of single components is not considered to be a failure unless the resulting performance decline exceeds the specification limits for the entire system.

Active redundancy is used in complex systems to achieve high availability with no performance decline.

Zero downtime involves massive redundancy, which is needed for some types of aircraft and for most kinds of communications satellites.

N-1 means the model is stressed by evaluating performance with all possible combinations where one component is faulted.

N-2 means the model is stressed by evaluating performance with all possible combinations where two component are faulted simultaneously.

A survey among academic availability experts in 2010 ranked reasons for unavailability of enterprise IT systems.

All reasons refer to not following best practice in each of the following areas (in order of importance):[25] A book on the factors themselves was published in 2003.

[26] In a 1998 report from IBM Global Services, unavailable systems were estimated to have cost American businesses $4.54 billion in 1996, due to lost productivity and revenues.