The theory was originally proposed by Arthur P. Dempster[1] in the context of Kalman Filters and later was elaborated, refined, and applied to knowledge representation in artificial intelligence and decision making in finance and accounting by Liping Liu.
Logical knowledge is represented by linear equations, or geometrically, a certainty hyperplane.
Probabilistic knowledge is represented by a normal distribution across all parallel focal elements.
In general, assume X is a vector of multiple normal variables with mean μ and covariance Σ.
By using the fully swept moment matrix, we represent the vacuous linear belief functions as a zero matrix in the swept form follows: One way to understand the representation is to imagine complete ignorance as the limiting case when the variance of X approaches to ∞, where one can show that Σ−1 = 0 and hence
For this reason, a better way is to understand the vacuous linear belief functions as the neutral element for combination (see later).
Suppose X and Y are two vectors of normal variables with the joint moment matrix: Then M(X, Y) may be partially swept.
A swept matrix obtained from a partial sweeping on a subset of variables can be equivalently obtained by a sequence of partial sweepings on each individual variable in the subset and the order of the sequence does not matter.
Thus, the elements corresponding to X in the above partial sweeping equation represent the marginal distribution of X in potential form.
These semantics render the partial sweeping operation a useful method for manipulating multivariate normal distributions.
This case involves a mix of an ordinary normal distribution for Y and a vacuous belief function for X.
This observation is interesting; it characterizes the difference between partial ignorance and linear equations in one parameter — correlation.
Suppose X and Y are two vectors and Y = XA + b + E, where A and b are the appropriate coefficient matrices and E is an independent white noise satisfying E ~ N(0, Σ).
We represent the model as the following partially swept matrix: This linear regression model may be considered as the combination of two pieces of knowledge (see later), one is specified by the linear equation involving three variables X, Y, and E, and the other is a simple normal distribution of E, i.e., E ~ N(0, Σ).
Note that, in this alternative interpretation, a linear regression model forms a basic building block for knowledge representation and is encoded as one moment matrix.
From representing the six special cases, we see a clear advantage of the moment matrix representation, i.e., it allows a unified representation for seemingly diverse types of knowledge, including linear equations, joint and conditional distributions, and ignorance.
The unification is significant not only for knowledge representation in artificial intelligence but also for statistical analysis and engineering computation.
For example, the representation treats the typical logical and probabilistic components in statistics — observations, distributions, improper priors (for Bayesian statistics), and linear equation models — not as separate concepts, but as manifestations of a single concept.
There are two basic operations for making inferences in expert systems using linear belief functions: combination and marginalization.
will recover the moment matrix M(X) as follows: If a moment matrix is in a partially swept form, say its partially reverse sweeping on X is defined as follows: Reverse sweepings are similar to those of forward ones, except for a sign difference for some multiplications.
Later he proves a claim by Arthur P. Dempster and reexpresses the formula as the sum of two fully swept matrices.
Here we use it to define the combination of two linear belief functions, which include normal distributions as a special case.
Also, note that a vacuous linear belief function (0 swept matrix) is the neutral element for combination.
In this case, we can pretend the variance to be an extremely small number, say ε, and perform the desired sweeping and combination.
Since zero variance means complete certainty about a variable, this ε-procedure will vanish ε terms in the final result.
In general, to combine two linear belief functions, their moment matrices must be fully swept.
As we mentioned, the regression model may be considered as the combination of two pieces of knowledge: one is specified by the linear equation involving three variables X, Y, and E, and the other is a simple normal distribution of E, i.e., E ~ N(0, Σ).
If there is a direct observation on cash receipts, we can represent the evidence as an equation say, C = 50 (thousand dollars).
If the auditor knows nothing about the beginning balance of accounts receivable, we can represent his or her ignorance by a vacuous LBF.
Finally, if historical data suggests that, given cash receipts C, the sales S is on the average 8C + 4 and has a standard deviation 4 thousand dollars, we can represent the knowledge as a linear regression model S ~ N(4 + 8C, 16).