Algorithm selection

If we can identify when to use which algorithm, we can optimize for each scenario and improve overall performance.

, the algorithm selection problem consists of finding a mapping

[1][2] A well-known application of algorithm selection is the Boolean satisfiability problem.

Here, the portfolio of algorithms is a set of (complementary) SAT solvers, the instances are Boolean formulas, the cost metric is for example average runtime or number of unsolved instances.

So, the goal is to select a well-performing SAT solver for each individual instance.

-hard problems (such as mixed integer programming, CSP, AI planning, TSP, MAXSAT, QBF and answer set programming).

Competition-winning systems in SAT are SATzilla,[3] 3S[4] and CSHC[5] In machine learning, algorithm selection is better known as meta-learning.

The portfolio of algorithms consists of machine learning algorithms (e.g., Random Forest, SVM, DNN), the instances are data sets and the cost metric is for example the error rate.

So, the goal is to predict which machine learning algorithm will have a small error on each data set.

The algorithm selection problem is mainly solved with machine learning techniques.

By representing the problem instances by numerical features

, algorithm selection can be seen as a multi-class classification problem by learning a mapping

For example, we can count the number of variables, clauses, average clause length for Boolean formulas,[6] or number of samples, features, class balance for ML data sets to get an impression about their characteristics.

We distinguish between two kinds of features: Depending on the used performance metric

For example, if we use running time as performance metric, we include the time to compute our instance features into the performance of an algorithm selection system.

SAT solving is a concrete example, where such feature costs cannot be neglected, since instance features for CNF formulas can be either very cheap (e.g., to get the number of variables can be done in constant time for CNFs in the DIMACs format) or very expensive (e.g., graph features which can cost tens or hundreds of seconds).

It is important to take the overhead of feature computation into account in practice in such scenarios; otherwise a misleading impression of the performance of the algorithm selection approach is created.

For example, if the decision which algorithm to choose can be made with perfect accuracy, but the features are the running time of the portfolio algorithms, there is no benefit to the portfolio approach.

A new instance is assigned to a cluster and the associated algorithm selected.

[7] A more modern approach is cost-sensitive hierarchical clustering[5] using supervised learning to identify the homogeneous instance subsets.

A common approach for multi-class classification is to learn pairwise models between every pair of classes (here algorithms) and choose the class that was predicted most often by the pairwise models.

We can weight the instances of the pairwise prediction problem by the performance difference between the two algorithms.

This is motivated by the fact that we care most about getting predictions with large differences correct, but the penalty for an incorrect prediction is small if there is almost no performance difference.