Predictions for compounds within the AD are generally considered more reliable than those outside, as the model is primarily valid for interpolation within the training data space, rather than extrapolation.
While no single, universally accepted algorithm for defining the applicability domain exists, several methods are commonly employed.
[1][2] One systematic approach focuses on defining interpolation regions by removing outliers and using a kernel-weighted sampling method to estimate the probability density distribution.
For regression-based QSAR models, a widely used technique for assessing the structural AD relies on leverage values, calculated from the diagonal elements of the hat matrix of the molecular descriptors.
[3][4][5] More recently, a rigorous benchmarking study suggested that the standard deviation of model predictions offers the most reliable approach for AD determination.