Group testing

Group testing has many applications, including statistics, biology, computer science, medicine, engineering and cyber security.

[8] The concept of group testing was first introduced by Robert Dorfman in 1943 in a short report[2] published in the Notes section of Annals of Mathematical Statistics.

Dorfman tabulated the optimum group sizes for this strategy against the prevalence rate of defectiveness in the population.

[2] Stephen Samuels [10] found a closed-form solution for the optimal group size as a function of the prevalence rate.

[11] This newer process starts by again performing individual testing on the positive groups, but stopping as soon as a defective is identified.

[12] They described five new procedures – in addition to generalisations for when the prevalence rate is unknown – and for the optimal one, they provided an explicit formula for the expected number of tests it would use.

[17] This was achieved by changing the binary search in the binary-splitting algorithm to a complex set of sub-algorithms with overlapping test groups.

[5] Aldridge, Baldassini and Johnson (2014) produced an extension of the COMP algorithm that added additional post-processing steps.

[f] A lower bound on the number of tests needed can be described using the notion of sample space, denoted

In fact, the information lower bound can be generalised to the case where there is a non-zero probability that the algorithm makes an error.

In this form, the theorem gives us an upper bound on the probability of success based on the number of tests.

In the second phase, often called the decoding step, the results of each group test are analysed to determine which items are likely to be defective.

This means there is a simple procedure for finding the defectives: just remove every item that appears in a negative test.

[3][16] Non-adaptive group-testing algorithms tend to assume that the number of defectives, or at least a good upper bound on them, is known.

In the noisy case, one relaxes the requirement in the original COMP algorithm that the set of locations of ones in any column of

[5] The definite defectives method (DD) is an extension of the COMP algorithm that attempts to remove any false positives.

Because of this PP is particularly effective for large sample sizes, since the number of tests grows only linearly with respect to

[24] The generality of the theory of group testing lends it to many diverse applications, including clone screening, locating electrical shorts;[3] high speed computer networks;[25] medical examination, quantity searching, statistics;[18] machine learning, DNA sequencing;[26] cryptography;[27][28] and data forensics.

[30] A prominent problem with multiaccess channels is how to assign transmission times to the users so that their messages do not collide.

In the context of group testing, this problem is usually tackled by dividing time into 'epochs' in the following way.

Machine learning is a field of computer science that has many software applications such as DNA classification, fraud detection and targeted advertising.

One of the main subfields of machine learning is the 'learning by examples' problem, where the task is to approximate some unknown function when given its value at a number of specific points.

[35] During a pandemic such as the COVID-19 outbreak in 2020, virus detection assays are sometimes run using nonadaptive group testing designs.

[37][38][39] One example was provided by the Origami Assays project which released open source group testing designs to run on a laboratory standard 96 well plate.

[40] In a laboratory setting, one challenge of group testing is the construction of the mixtures can be time-consuming and difficult to do accurately by hand.

Data forensics is a field dedicated to finding methods for compiling digital evidence of a crime.

This is a function that takes the data, and through a difficult-to-reverse procedure, produces a unique number called a hash.

[29] One way to get around this limitation is to store more hashes – now of subsets of the data structure – to narrow down where the attack has occurred.

This indicates that at least one edited datum (which is taken as defectiveness in this model) is contained in the group that generated the current hash.

[29] In fact, the amount of hashes needed is so low that they, along with the testing matrix they refer to, can even be stored within the organisational structure of the data itself.

An illustration of the lightbulb problem, where one is searching for a broken bulb among six lightbulbs. Here, the first three are connected to a power supply, and they light up (A). This indicates that the broken bulb must be one of the last three (B). If instead the bulbs did not light up, one could be sure that the broken bulb was among the first three. Continuing this procedure can locate the broken bulb in no more than three tests, compared to a maximum of six tests if the bulbs are checked individually.
A diagram showing a group testing matrix along with associated vectors, x and y.
A typical group testing setup. A non-adaptive algorithm first chooses the matrix , and is then given the vector y . The problem is then to find an estimate for x .
An illustration of the generalised binary-splitting algorithm where there are 8 defectives and 135 total items. Here, , and the first test gives a negative result so every item is declared non-defective. Hence there are 119 remaining items, so . This second group gives a positive result, so a binary search is used to find a defective. Once that is done, the whole process is repeated, calculating a new using only those items whose defectiveness has not been determined.
An illustration of the COMP algorithm. COMP identifies item a as being defective and item b as being non-defective. However, it incorrectly labels c as a defective, since it is “hidden” by defective items in every test in which it appears.
Design of a group with samples (blue) from a set of total samples using the Polynomial Pools Algorithm
An illustration of a multiaccess channel showing a successful message and a message collision
Origami Assay paper template for group testing design