Algorithmic efficiency

Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

Bubble sort organizes the list in time proportional to the number of elements squared (

, see Big O notation), but only requires a small amount of extra memory which is constant with respect to the length of the list (

The importance of efficiency with respect to time was emphasized by Ada Lovelace in 1843 as applied to Charles Babbage's mechanical analytical engine: "In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selections amongst them for the purposes of a calculating engine.

Nevertheless, Donald Knuth emphasized that efficiency is still an important consideration: "In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"[2]An algorithm is considered efficient if its resource consumption, also known as computational cost, is at or below some acceptable level.

Roughly speaking, 'acceptable' means: it will run in a reasonable amount of time or space on an available computer, typically as a function of the size of the input.

Many of these measures depend on the size of the input to the algorithm, i.e. the amount of data to be processed.

In practice, there are other factors which can affect the efficiency of an algorithm, such as requirements for accuracy and/or reliability.

As detailed below, the way in which an algorithm is implemented can also have a significant effect on actual efficiency, though many aspects of this relate to optimization issues.

In the theoretical analysis of algorithms, the normal practice is to estimate their complexity in the asymptotic sense.

Some examples of Big O notation applied to algorithms' asymptotic time complexity include: For new versions of software or to provide comparisons with competitive systems, benchmarks are sometimes used, which assist with gauging an algorithms relative performance.

If a new sort algorithm is produced, for example, it can be compared with its predecessors to ensure that at least it is efficient as before with known data, taking into consideration any functional improvements.

Even creating "do it yourself" benchmarks can demonstrate the relative performance of different programming languages, using a variety of user specified criteria.

This is quite simple, as a "Nine language performance roundup" by Christopher W. Cowell-Shah demonstrates by example.

There are other factors which may affect time or space issues, but which may be outside of a programmer's control; these include data alignment, data granularity, cache locality, cache coherency, garbage collection, instruction-level parallelism, multi-threading (at either a hardware or software level), simultaneous multitasking, and subroutine calls.

As parallel and distributed computing grow in importance in the late 2010s, more investments are being made into efficient high-level APIs for parallel and distributed computing systems such as CUDA, TensorFlow, Hadoop, OpenMP and MPI.

This is often the case in embedded systems with respect to floating-point arithmetic, where small and low-power microcontrollers often lack hardware support for floating-point arithmetic and thus require computationally expensive software routines to produce floating point calculations.

The two most common measures are: For computers whose power is supplied by a battery (e.g. laptops and smartphones), or for very long/large calculations (e.g. supercomputers), other measures of interest are: As of 2018[update], power consumption is growing as an important metric for computational tasks of all types and at all scales ranging from embedded Internet of things devices to system-on-chip devices to server farms.

More detailed estimates are needed to compare algorithm performance when the amount of data is small, although this is likely to be of less importance.

Run-based profiling can be very sensitive to hardware configuration and the possibility of other programs or tasks running at the same time in a multi-processing and multi-programming environment.

In the late 2010s, it is typical for personal computers to have between 4 and 32 GB of RAM, an increase of over 300 million times as much memory.

Because of this, cache replacement policies are extremely important to high-performance computing, as are cache-aware programming and data alignment.

To further complicate the issue, some systems have up to three levels of cache memory, with varying effective speeds.

An algorithm which will not fit completely in cache memory but which exhibits locality of reference may perform reasonably well.