Distributed source coding

Distributed source coding (DSC) is an important problem in information theory and communication.

DSC problems regard the compression of multiple correlated information sources that do not communicate with each other.

One of the main properties of distributed source coding is that the computational burden in encoders is shifted to the joint decoder.

[5] Although the theorems on DSC were proposed on 1970s, it was after about 30 years that attempts were started for practical techniques, based on the idea that DSC is closely related to channel coding proposed in 1974 by Aaron D.

[6] The asymmetric DSC problem was addressed by S. S. Pradhan and K. Ramchandran in 1999, which focused on statistically dependent binary and Gaussian sources and used scalar and trellis coset constructions to solve the problem.

This asymmetric DSC scheme can be easily extended to the case of more than two correlated information sources.

The information theoretical lossless compression bound on DSC (the Slepian–Wolf bound) was first purposed by David Slepian and Jack Keil Wolf in terms of entropies of correlated information sources in 1973.

[4] Similar results were obtained in 1976 by Aaron D. Wyner and Jacob Ziv with regard to lossy coding of joint Gaussian sources.

finite-alphabet random sequences X and Y, Slepian–Wolf theorem includes theoretical bound for the lossless coding rate for distributed coding of the two sources as below:[3] If both the encoder and decoder of the two sources are independent, the lowest rate we can achieve for lossless compression is

However, with joint decoding, if vanishing error probability for long sequences is accepted, the Slepian–Wolf theorem shows that much better compression rate can be achieved.

and none of the sources is encoded with a rate larger than its entropy, distributed coding can achieve arbitrarily small error probability for long sequences.

A special case of distributed coding is compression with decoder side information, where source

It was found that for Gaussian memoryless sources and mean-squared error distortion, the lower bound for the bit rate of

are two discrete, memoryless, uniformly distributed sources which generate set of variables

[7][8] The basic framework of syndrome based DSC is that, for each source, its input space is partitioned into several cosets according to the particular channel coding method used.

Pradhan and Ramchandran designed rules for construction of sub-codes for each source, and presented result of trellis-based coset constructions in DSC, which is based on convolution code and set-partitioning rules as in Trellis modulation, as well as lattice code based DSC.

The encoders of these codes are usually simple and easy to implement, while the decoders have much higher computational complexity and are able to get good performance by utilizing source statistics.

Although most research focused on DSC with two dependent sources, Slepian–Wolf coding has been extended to more than two input sources case, and sub-codes generation methods from one channel code was proposed by V. Stankovic, A. D. Liveris, etc.

In symmetric case, what we want is equal bitrate for the two sources: 5 bits each with separate encoder and joint decoder.

The basic idea is similar, but in this case, we need to do coset partition for both sources, while for a pair of received syndromes (corresponds to one coset), only one pair of input variables are possible given the correlation between two sources.

[23] Unfortunately, the above approaches do not scale (in design or operational complexity requirements) to sensor networks of large sizes, a scenario where distributed compression is most helpful.

If there are N sources transmitting at R bits each (with some distributed coding scheme), the number of possible reconstructions scales

Recently, an approach,[24] using ideas borrowed from Fusion Coding of Correlated Sources, has been proposed where design and operational complexity are traded against decoder performance.

This has allowed distributed quantizer design for network sizes reaching 60 sources, with substantial gains over traditional approaches.

be the set of all subsets of the NR bits i.e. Then, we define the bit-subset selector mapping to be

Note that each choice of the bit-subset selector imposes a storage requirement (C) that is exponential in the cardinality of the set of chosen bits.

This allows a judicious choice of bits that minimize the distortion, given the constraints on decoder storage.

The effective cost function that needs to be minimized is a weighted sum of distortion and decoder storage

The system design is performed by iteratively (and incrementally) optimizing the encoders, decoder and bit-subset selector till convergence.

In other words, if all source tuples of interest have different syndromes, then one can recover them losslessly.