Long-tail traffic

A long-tailed or heavy-tailed distribution is one that assigns relatively high probabilities to regions far from the mean or median.

The terms are distinct although superpositions of samples from heavy-tailed distributions aggregate to form long-range dependent time series.

Mandelbrot established the use of heavy-tail distributions to model real-world fractal phenomena, e.g. Stock markets, earthquakes, and the weather.

[4] Self-similarity in packetised data networks can be caused by the distribution of file sizes, human interactions and/or Ethernet dynamics.

where ρ(k) is the autocorrelation function at a lag k, α is a parameter in the interval (0,1) and the ~ means asymptotically proportional to as k approaches infinity.

This variance to mean power law is an inherent feature of a family of statistical distributions called the Tweedie exponential dispersion models.

Much as the central limit theorem explains how certain types of random data converge towards the form of a normal distribution there exists a related theorem, the Tweedie convergence theorem that explains how other types of random data will converge towards the form of these Tweedie distributions, and consequently express both the variance to mean power law and a power law decay in their autocorrelation functions.

The Hurst parameter H is a measure of the level of self-similarity of a time series that exhibits long-range dependence, to which the heavy-tail distribution can be applied.

[9] In the graph above left, the condition for the existence of a heavy-tail distribution, as previously presented, is not met by the curve labelled "Gamma-Exponential Tail".

Readers interested in a more rigorous mathematical treatment of the subject are referred to the external links section.

[11] This is currently the most popular explanation in the engineering literature and the one with the most empirical evidence through observed file size distributions.

At a critical packet creation rate, the flow in a network becomes congested and exhibits 1/f noise and long-tail traffic characteristics.

[2] It has for long been realised that efficient and accurate modelling of various real-world phenomena needs to incorporate the fact that observations made on different scales each carry essential information.

[15] Classical models of time series such as Poisson and finite Markov processes rely heavily on the assumption of independence, or at least weak dependence.

Nonlinear methods are used for producing packet traffic models which can replicate both short-range and long-range dependent streams.

These include the following: No unanimity exists about which of the competing models is appropriate,[4] but the Poisson Pareto Burst Process (PPBP), which is an M/G/

The extent to which heavy-tailedness degrades network performance is determined by how well congestion control is able to shape source traffic into an on-average constant output stream while conserving information.

Traffic self-similarity negatively affects primary performance measures such as queue size and packet-loss rate.

[18] Additionally, aggregating streams of long-tail traffic typically intensifies the self-similarity ("burstiness") rather than smoothing it, compounding the problem.

In the modern network environment with multimedia and other QoS sensitive traffic streams comprising a growing fraction of network traffic, second-order performance measures in the form of “jitter” such as delay variation and packet loss variation are of import to provisioning user-specified QoS.

For network queues with long-range dependent inputs, the sharp increase in queuing delays at fairly low levels of utilisation and slow decay of queue lengths implies that an incremental improvement in loss performance requires a significant increase in buffer size.

When traffic is self-similar, we find that queuing delay grows proportionally to the buffer capacity present in the system.

To achieve a constant level of throughput or packet loss as self-similarity is increased, extremely large buffer capacity is needed.

The short fixed-length cell used in ATM reduces the delay and most significantly the jitter for delay-sensitive services such as voice and video.

[22] Workload pattern complexities (for example, bursty arrival patterns) can significantly affect resource demands, throughput, and the latency encountered by user requests, in terms of higher average response times and higher response time variance.

Without adaptive, optimal management and control of resources, SLAs based on response time are impossible.

The capacity requirements on the site are increased while its ability to provide acceptable levels of performance and availability diminishes.

[18] With respect to SLAs, the same level of service for heavy-tailed distributions requires a more powerful set of servers, compared with the case of independent light-tailed request traffic.

[18] Reference to additional information on the effect of long-range dependency on network performance can be found in the external links section.

For short files, which constitute the bulk of connection requests in heavy-tailed file size distributions of web servers, elaborate feedback control may be bypassed in favour of lightweight mechanisms in the spirit of optimistic control, which can result in improved bandwidth utilisation.