Literature-based discovery

He used this to propose fish oil as a treatment for Raynaud syndrome due to their shared relationship with blood viscosity.

Although the ABC paradigm is widely used, critics of the system have argued that much of science is not captured on simple assertions and it is rather built from analogies and images at a higher level of abstraction.

With closed discovery, the A and C are given to the approach which seeks to find the Bs which can link the two, thus testing a hypothesis about A and C.[1] A number of systems to perform literature-based discovery have been developed over the years, extending the original idea of Don Swanson, and the evaluation of the quality of such systems is an active area of research.

[16] One well-known system within the field is called Arrowsmith and is tailored to find connections between two disjoint sets of articles, an approach labeled "two-node" search.

[26] Besides extracting information from the body of scientific articles, LBD systems often employ structured knowledge from biocurated biological resources, like the Online Mendelian Inheritance in Men (OMIM).

[27] These are the published LBD systems, ordered by date of publication:[29] A common task in literature-based discovery is assigning words/concepts to different semantic types.

A high precision approach is to get expert opinion to generate the gold standard,[52] but this is time-consuming, expensive and tends to produce low recall rates.

[1] The advantage of time-slicing in comparison to the replication of previous discoveries is the evaluation on a large number of test instances.

[58] The language in scientific articles often include ambiguities, and an important step for coeherent parsing of the literature is the extraction of the sense of each term in the context they are used, a task called Word-sense disambiguation (WSD).

[59] For example, terms for genes like CT (PCYT1A) called and MR (NR3C2) can be confused with the acronyms for Computational Tomography and Magnetic Resonance, requiring sofisticated disambiguation systems.

[71] In the context of systems vaccinology, it was used to identify proteins related to interferon gamma and that play a role in the response to vaccines.

[72] LBD has been explored as a tool to identify biomarkers for diagnostic and prognostic for diseases, e.g. for the risk of type 2 diabetes.

[73] Besides providing scientific hypotheses about the world, LBD has also been used to improve data analysis, via the automatic identification of possible confounding factors using the medical literature.

An example diagram of Swanson linking, usinc the ABC paradigm
The Anni 2.0 literature-based discovery system, employing a workflow similar to other LBD systems. [ 28 ]
Gene name normalization, an important step in LBD when dealing with genes [ 57 ]