Small molecules (also called ligands in drug design applications), are usually represented using lists of atoms and their connections.
Large molecules such as proteins are however more compactly represented using the sequences of their amino acid building blocks.
Large chemical databases for structures are expected to handle the storage and searching of information on millions of molecules taking terabytes of physical memory.
A popular example that lists chemical reaction data, among others, would be the Beilstein database, Reaxys Thermophysical data are information about There are two principal techniques for representing chemical structures in digital databases These approaches have been refined to allow representation of stereochemical differences and charges as well as special kinds of bonding such as those seen in organo-metallic compounds.
The principal advantage of a computer representation is the possibility for increased storage and fast, flexible search.
This kind of search is achieved by looking for subgraph isomorphism (sometimes also called a monomorphism) and is a widely studied application of graph theory.
[8][11] Search by matching 3D conformation of molecules or by specifying spatial constraints is another feature that is particularly of use in drug design.
[24] Suppliers of chemicals as synthesis intermediates or for high-throughput screening routinely provide search interfaces.
Trivial names on the other hand abound with homonyms and synonyms and are therefore a bad choice as a defining database key.
can mostly be computed directly based on the molecule's structure, pharmacological descriptors can be derived only indirectly using involved multivariate statistics or experimental (screening, bioassay) results.
There is no single definition of molecular similarity, however the concept may be defined according to the application and is often described as an inverse of a measure of distance in descriptor space.
MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).
By applying rules of precedence for the generation of stringified notations, one can obtain unique/'canonical' string representations such as 'canonical SMILES'.
A key difference between a registration system and a simple chemical database is the ability to accurately represent that which is known, unknown, and partially known.