Chemical graph generator

[1] Molecular structures are graphs with chemical constraints such as valences, bond multiplicity and fragments.

[3] DENDRAL was developed as a part of the Mariner program launched by the NASA to search for life on Mars.

In the orderly generation method, specific order-check functions are performed on graph representatives, such as vectors.

For example, MOLGEN[9] performs a descending order check while filling rows of adjacency matrices.

One of the earliest attempts was made by Hidetsugu Abe in 1975 using a pattern recognition-based structure generator.

Hidetsugu Abe and the other contributors published the first paper on CHEMICS,[11] which is a CASE tool comprising several structure generation methods.

Substantial contributions were made by Craig Shelley and Morton Munk, who published a large number of CASE papers in this field.

Based on the molecular formula, the generator forms bonds between pairs of atoms, and all the extensions are checked against the given constraints.

LUCY is an open-source structure elucidation method based on the HMBC data of unknown molecules,[16] and involves an exhaustive 2-step structure generation process where first all combinations of interpretations of HMBC signals are implemented in a connectivity matrix, which is then completed by a deterministic generator filling in missing bond information.

This platform could generate structures with any arbitrary size of molecules; however, molecular formulas with more than 30 heavy atoms are too time consuming for practical applications.

[17] To overcome the limitations of the exhaustive approach, SENECA was developed as a stochastic method to find optimal solutions.

LSD is an open source structure generator released under the General Public License (GPL).

A well-known commercial CASE system, StrucEluc,[20] also features a NMR based generator.

[22] The decomposition of the molecular formula into fragments, components and segments was performed as an application of integer partitioning.

The software, MOLSIG,[24] was integrated into this stochastic generator for canonical labelling and duplicate checks.

OMG generates structures based on the canonical augmentation method from Brendan McKay's NAUTY package.

[28] Although NAUTY is an efficient tool for graph canonical labelling, OMG is approximately 2000 times slower than MOLGEN.

For example, MOLGEN-MS[33] allows users to input mass spectrometry data of an unknown molecule.

As a type of assembly method, building blocks, such as ring systems and atom fragments, are used in the structure generation.

To reduce the number of duplicates, Brendan McKay's canonical path augmentation method is used.

To overcome the combinatorial explosion in the generation, applicability domain and ring systems are detected based on inverse QSPR/QSAR analysis.

[35] The applicability domain, or target area, is described based on given biological as well as pharmaceutical activity information from QSPR/QSAR.

For example, a well-known tool called RetroPath[37] is used for molecular structure enumeration and virtual screening based on the given reaction rules.

[38] Its core algorithm is a breadth-first method, generating structures by applying reaction rules to each source compound.

Structure generation and enumeration are performed based on Brendan McKay's canonical augmentation method.

RetroPath 2.0 provides a variety of workflows such as isomer transformation, enumeration, QSAR and metabolomics.

Unlike assembly methods, the generation tree starts with the hypergraph, and the structures decrease in size at each step.

The earliest reduction-based structure generator is COCOA,[41] an exhaustive and recursive bond-removal method.

Once the structure is built in the matrix representation, the saturated molecule is stored in the output list.

This article was adapted from the following source under a CC BY 4.0 license (2021) (reviewer reports): Mehmet Aziz Yirik; Christoph Steinbeck (5 January 2021).