Group testing

Group testing has many applications, including statistics, biology, computer science, medicine, engineering and cyber security.

[8] The concept of group testing was first introduced by Robert Dorfman in 1943 in a short report[2] published in the Notes section of Annals of Mathematical Statistics.

Dorfman tabulated the optimum group sizes for this strategy against the prevalence rate of defectiveness in the population.

[2] Stephen Samuels [10] found a closed-form solution for the optimal group size as a function of the prevalence rate.

[11] This newer process starts by again performing individual testing on the positive groups, but stopping as soon as a defective is identified.

[12] They described five new procedures – in addition to generalisations for when the prevalence rate is unknown – and for the optimal one, they provided an explicit formula for the expected number of tests it would use.

[17] This was achieved by changing the binary search in the binary-splitting algorithm to a complex set of sub-algorithms with overlapping test groups.

[5] Aldridge, Baldassini and Johnson (2014) produced an extension of the COMP algorithm that added additional post-processing steps.

[f] A lower bound on the number of tests needed can be described using the notion of sample space, denoted

In fact, the information lower bound can be generalised to the case where there is a non-zero probability that the algorithm makes an error.

In this form, the theorem gives us an upper bound on the probability of success based on the number of tests.

In the second phase, often called the decoding step, the results of each group test are analysed to determine which items are likely to be defective.

This means there is a simple procedure for finding the defectives: just remove every item that appears in a negative test.

[3][16] Non-adaptive group-testing algorithms tend to assume that the number of defectives, or at least a good upper bound on them, is known.

In the noisy case, one relaxes the requirement in the original COMP algorithm that the set of locations of ones in any column of

[5] The definite defectives method (DD) is an extension of the COMP algorithm that attempts to remove any false positives.

Because of this PP is particularly effective for large sample sizes, since the number of tests grows only linearly with respect to

[24] The generality of the theory of group testing lends it to many diverse applications, including clone screening, locating electrical shorts;[3] high speed computer networks;[25] medical examination, quantity searching, statistics;[18] machine learning, DNA sequencing;[26] cryptography;[27][28] and data forensics.

[30] A prominent problem with multiaccess channels is how to assign transmission times to the users so that their messages do not collide.

In the context of group testing, this problem is usually tackled by dividing time into 'epochs' in the following way.

Machine learning is a field of computer science that has many software applications such as DNA classification, fraud detection and targeted advertising.

One of the main subfields of machine learning is the 'learning by examples' problem, where the task is to approximate some unknown function when given its value at a number of specific points.

[35] During a pandemic such as the COVID-19 outbreak in 2020, virus detection assays are sometimes run using nonadaptive group testing designs.

[37][38][39] One example was provided by the Origami Assays project which released open source group testing designs to run on a laboratory standard 96 well plate.

[40] In a laboratory setting, one challenge of group testing is the construction of the mixtures can be time-consuming and difficult to do accurately by hand.

Data forensics is a field dedicated to finding methods for compiling digital evidence of a crime.

This is a function that takes the data, and through a difficult-to-reverse procedure, produces a unique number called a hash.

[29] One way to get around this limitation is to store more hashes – now of subsets of the data structure – to narrow down where the attack has occurred.

This indicates that at least one edited datum (which is taken as defectiveness in this model) is contained in the group that generated the current hash.

[29] In fact, the amount of hashes needed is so low that they, along with the testing matrix they refer to, can even be stored within the organisational structure of the data itself.