The values of X are irrelevant; what matters is the partition (denote it αX) of the sample space Ω into disjoint sets {X = xn}.
This fact amounts to the equality for y = 0,1,2,3; which is an instance of the law of total probability.
The density of X may be calculated by integration, surprisingly, the result does not depend on x in (−1,1), which means that X is distributed uniformly on (−1,1).
Accordingly, P ( Y ≤ 0.75 | X = 0.5 ) cannot be interpreted via empirical frequencies, since the exact value X = 0.5 has no chance to appear at random, not even once during an infinite sequence of independent trials.
That is, it minimizes the mean square error E ( |Z| - f(X) )2 on the class of all random variables of the form f(X).
This fact amounts to the equalities the latter being the instance of the law of total probability mentioned above.
This successful geometric explanation may create the illusion that the following question is trivial.
It may seem evident that the conditional distribution must be uniform on the given circle (the intersection of the given sphere and the given plane).
[4][5] Appeals to symmetry can be misleading if not formalized as invariance arguments.Another example.
Geometric intuition suggests that the angle is independent of the axis and distributed uniformly.
In the latter two examples the law of total probability is irrelevant, since only a single event (the condition) is given.
By contrast, if the given event is of zero probability then conditioning on it is ill-defined unless some additional input is provided.
Measure-theoretic conditioning (below) investigates Case (c), discloses its relation to (b) in general and to (a) when applicable.
Another example: let X be a random variable distributed uniformly on (0,1), and B the event "X is a rational number"; what about P ( X = 1/n | B ) ?
That is, it minimizes the mean square error E ( I - g(X) )2 on the class of all random variables of the form g (X).
In the case f = f1 the corresponding function g = g1 may be calculated explicitly,[details 1] Alternatively, the limiting procedure may be used, giving the same result.
It can be computed numerically, using finite-dimensional approximations to the infinite-dimensional Hilbert space.
Once again, the expectation of the random variable P ( Y ≤ 1/3 | X ) = g2 (X) is equal to the (unconditional) probability, E ( P ( Y ≤ 1/3 | X ) ) = P ( Y ≤ 1/3 ), namely, However, the Hilbert space approach treats g2 as an equivalence class of functions rather than an individual function.
That is, μ is the (unconditional) distribution of X, while ν is one third of its conditional distribution, Both approaches (via the Hilbert space, and via the Radon–Nikodym derivative) treat g as an equivalence class of functions; two functions g and g′ are treated as equivalent, if g (X) = g′ (X) almost surely.
Accordingly, the conditional probability P ( Y ≤ 1/3 | X ) is treated as an equivalence class of random variables; as usual, two random variables are treated as equivalent if they are equal almost surely.
In the case f = f1 the corresponding function h = h1 may be calculated explicitly,[details 2] Alternatively, the limiting procedure may be used, giving the same result.
Nevertheless it exists, and can be computed numerically in the same way as g2 above, — as the orthogonal projection in the Hilbert space.
The law of total expectation holds, since the projection cannot change the scalar product by the constant 1 belonging to the subspace.
In the case f = f1 the conditional cumulative distribution function may be calculated explicitly, similarly to g1.
The limiting procedure gives: which cannot be correct, since a cumulative distribution function must be right-continuous!
is well-defined (via the Hilbert space or the Radon–Nikodym derivative) as an equivalence class of functions (of x).
Treated as a function of y for a given x it is ill-defined unless some additional input is provided.
In the considered example this is the case; the correct result for x = 0.75, shows that the conditional distribution of Y given X = 0.75 consists of two atoms, at 0.25 and 0.5, of probabilities 1/3 and 2/3 respectively.
For a given y it is well-defined (via the Hilbert space or the Radon–Nikodym derivative) as an equivalence class of functions (of x).
In general, conditional distributions need not be atomic or absolutely continuous (nor mixtures of both types).