The field was established and formalized by Claude Shannon in the 1940s,[1] though early contributions were made in the 1920s through the works of Harry Nyquist and Ralph Hartley.
Applications of fundamental topics of information theory include source coding/data compression (e.g. for ZIP files), and channel coding/error detection and correction (e.g. for DSL).
Its impact has been crucial to the success of the Voyager missions to deep space,[4] the invention of the compact disc, the feasibility of mobile phones and the development of the Internet and artificial intelligence.
[5][6][3] The theory has also found applications in other areas, including statistical inference,[7] cryptography, neurobiology,[8] perception,[9] signal processing,[2] linguistics, the evolution[10] and function[11] of molecular codes (bioinformatics), thermal physics,[12] molecular dynamics,[13] black holes, quantum computing, information retrieval, intelligence gathering, plagiarism detection,[14] pattern recognition, anomaly detection,[15] the analysis of music,[16][17] art creation,[18] imaging system design,[19] study of outer space,[20] the dimensionality of space,[21] and epistemology.
[citation needed] The landmark event establishing the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.
[24][25][26] Shannon outlined some of his initial ideas of information theory as early as 1939 in a letter to Vannevar Bush.
[26] Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability.
Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.
[citation needed] Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs.
[citation needed] In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion: With it came the ideas of: Information theory is based on probability theory and statistics, where quantified information is usually described in terms of bits.
Intuitively, the entropy HX of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X when only its distribution is known.
Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution
In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.
A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints.
For the more general case of a process that is not necessarily stationary, the average rate is: that is, the limit of the joint entropy per symbol.
Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe.
Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.
Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks.
In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications.
In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key.
However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.
Pseudorandom number generators are widely available in computer language libraries and application programs.
They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software.
Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
One early commercial application of information theory was in the field of seismic oil exploration.
Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal.
Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.
[42] Semioticians Doede Nauta [nl] and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics.
[46] In this context, either an information-theoretical measure, such as functional clusters (Gerald Edelman and Giulio Tononi's functional clustering model and dynamic core hypothesis (DCH)[47]) or effective information (Tononi's integrated information theory (IIT) of consciousness[48][49][50]), is defined (on the basis of a reentrant process organization, i.e. the synchronization of neurophysiological activity between groups of neuronal populations), or the measure of the minimization of free energy on the basis of statistical methods (Karl J. Friston's free energy principle (FEP), an information-theoretical measure which states that every adaptive change in a self-organized system leads to a minimization of free energy, and the Bayesian brain hypothesis[51][52][53][54][55]).
Information theory also has applications in the search for extraterrestrial intelligence,[56] black holes,[57] bioinformatics,[58] and gambling.