Query optimization

The result of a query is generated by processing the rows in a database in a way that yields the requested information.

Processing times of the same query may have large variance, from a fraction of a second to hours, depending on the chosen method.

There is a trade-off between the amount of time spent figuring out the best query plan and the quality of the choice; the optimizer may not choose the best answer on its own.

Costs are used to estimate the runtime cost of evaluating the query, in terms of the number of I/O operations required, CPU path length, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism, and other factors determined from the data dictionary.

These consist of logical optimization—which generates a sequence of relational algebra to solve the query—and physical optimization—which is used to determine the means of carrying out each operation.

Most query optimizers determine join order via a dynamic programming algorithm pioneered by IBM's System R database project [citation needed].

So some database management systems use an alternative rule-based approach that uses a query graph model.

Traditionally, database systems estimate selectivities through fairly detailed statistics on the distribution of values in each column, such as histograms.

Query predicates are often highly correlated (for example, model='Accord' implies make='Honda'), and it is very hard to estimate the selectivity of the conjunct in general.

Both assumptions are sometimes violated in practice[8] and multiple extensions of classical query optimization have been studied in the research literature that overcome those limitations.

Those extended problem variants differ in how they model the cost of single query plans and in terms of their optimization goal.

There are often other cost metrics in addition to execution time that are relevant to compare query plans.

A number of tools display query execution plans to show which operations have the highest processing cost.

Fixing these issues found in these plans can shave tens of percent execution time, and in some cases can cut two-dimensional searches to linear ones.

One of the primary and simplest optimization checklists is to use operations that most RDMS are designed to execute efficiently.