Quantitative structure–activity relationship

[3][4] Related terms include quantitative structure–property relationships (QSPR) when a chemical property is modeled as the response variable.

In this context FB-QSAR proves to be a promising strategy for fragment library design and in fragment-to-lead identification endeavours.

The training set needs to be superimposed (aligned) by either experimental data (e.g. based on ligand-protein crystallography) or molecule superimposition software.

[21] An alternative approach uses multiple-instance learning by encoding molecules as sets of data instances, each of which represents a possible molecular conformation.

[22] On June 18, 2011 the Comparative Molecular Field Analysis (CoMFA) patent has dropped any restriction on the use of GRID and partial least-squares (PLS) technologies.

[citation needed] In this approach, descriptors quantifying various electronic, geometric, or steric properties of a molecule are computed and used to develop a QSAR.

[31][32] In the literature it can be often found that chemists have a preference for partial least squares (PLS) methods,[citation needed] since it applies the feature extraction and induction in one step.

[33][34] Typically QSAR models derived from non linear machine learning is seen as a "black box", which fails to guide medicinal chemists.

Recently there is a relatively new concept of matched molecular pair analysis[35] or prediction driven MMPA which is coupled with QSAR model in order to identify activity cliffs.

QSARs are being applied in many disciplines, for example: risk assessment, toxicity prediction, and regulatory decisions[37] in addition to drug discovery and lead optimization.

Even with external validation, it is difficult to determine whether the selection of training and test sets was manipulated to maximize the predictive capacity of the model being published.

[47] It is well known for instance that within a particular family of chemical compounds, especially of organic chemistry, that there are strong correlations between structure and observed properties.

[48] The biological activity of molecules is usually measured in assays to establish the level of inhibition of particular signal transduction or metabolic pathways.

Drug discovery often involves the use of QSAR to identify chemical structures that could have good inhibitory effects on specific targets and have low toxicity (non-specific activity).

Of special interest is the prediction of partition coefficient log P, which is an important measure used in identifying "druglikeness" according to Lipinski's Rule of Five.

[49] It is part of the machine learning method to reduce the risk for a SAR paradox, especially taking into account that only a finite amount of data is available (see also MVUE).

[52] Commonly used QSAR assessment software such as DEREK or CASE Ultra (MultiCASE) is used to genotoxicity of impurity according to ICH M7.

QSAR protocol