Hypergeometric distribution

Also note that This identity can be shown by expressing the binomial coefficients in terms of factorials and rearranging the latter.

Additionally, it follows from the symmetry of the problem, described in two different but interchangeable ways.

neutral marbles are drawn from an urn without replacement and coloured green.

draws without replacement, since the probability of success on each trial is not the same, as the size of the remaining population changes as we remove each marble.

Now, standing next to the urn, you close your eyes and draw n marbles without replacement.

Define X as a random variable whose outcome is k, the number of green marbles drawn in the experiment.

Standing next to the urn, you close your eyes and draw 10 marbles without replacement.

To intuitively explain the given formula, consider the two symmetric problems represented by the identity

Back to the calculations, we use the formula above to calculate the probability of drawing exactly k green marbles Intuitively we would expect it to be even more unlikely that all 5 green marbles will be among the 10 drawn.

The probability of drawing any set of green and red marbles (the hypergeometric distribution) depends only on the numbers of green and red marbles, not on the order in which they appear; i.e., it is an exchangeable distribution.

[4] Note: In order to derive the previous bounds, one has to start by observing that

Then, it is proved from Hoeffding [3] that the results and bounds obtained via this process hold for

If n is larger than N/2, it can be useful to apply symmetry to "invert" the bounds, which give you the following: [4] [5] The hypergeometric test uses the hypergeometric distribution to measure the statistical significance of having drawn a sample consisting of a specific number of

In a test for over-representation of successes in the sample, the hypergeometric p-value is calculated as the probability of randomly drawing

The test is often used to identify which sub-populations are over- or under-represented in a sample.

Suppose there are 5 black, 10 white, and 15 red marbles in an urn.

If six marbles are chosen without replacement, the probability that exactly two of each color are chosen is Election audits typically test a sample of machine-counted precincts to see if recounts by hand or machine match the original counts.

The sampling rates are usually defined by law, not statistical design, so for a legally defined sample size n, what is the probability of missing a problem which is present in K precincts, such as a hack or bug?

Bugs are often obscure, and a hacker can minimize detection by affecting only a few precincts, which will still affect close elections, so a plausible scenario is for K to be on the order of 5% of N. Audits typically cover 1% to 10% of precincts (often 3%),[9][10][11] so they have a high chance of missing a problem.

For example, if a problem is present in 5 of 100 precincts, a 3% sample has 86% probability that k = 0 so the problem would not be noticed, and only 14% probability of the problem appearing in the sample (positive k ): The sample would need 45 precincts in order to have probability under 5% that k = 0 in the sample, and thus have probability over 95% of finding the problem: In hold'em poker players make the best hand they can combining the two cards in their hand with the 5 cards (community cards) eventually turned up on the table.

The player would like to know the probability of one of the next 2 cards to be shown being a club to complete the flush.

Strictly speaking, the approach to calculating success probabilities outlined here is accurate in a scenario where there is just one player at the table; in a multiplayer game this probability might be adjusted somewhat based on the betting play of the opponents.)

Prior to each draw, a player selects a certain number of spots by marking a paper form supplied for this purpose.

For example, a player might play a 6-spot by marking 6 numbers, each from a range of 1 through 80 inclusive.

Generally speaking, the more hits (balls drawn that match player numbers selected) the greater the payoff.

For example, if a customer bets ("plays") $1 for a 6-spot (not an uncommon example) and hits 4 out of the 6, the casino would pay out $4.

The probability of this event is: Similarly, the chance for hitting 5 spots out of 6 selected is

The only other nonzero payout might be $1 for hitting 3 numbers (i.e., you get your bet back), which has a probability near 0.129819548.

Taking the sum of products of payouts times corresponding probabilities we get an expected return of 0.70986492 or roughly 71% for a 6-spot, for a house advantage of 29%.

This very poor return (for the player) is usually explained by the large overhead (floor space, equipment, personnel) required for the game.

Biologist and statistician Ronald Fisher
Samples used for election audits and resulting chance of missing a problem