In particular, a large part of queries issued on relational databases can be expressed in this way.
The conjunctive queries are the fragment of (domain independent) first-order logic given by the set of formulae that can be constructed from atomic formulae using conjunction ∧ and existential quantification ∃, but not using disjunction ∨, negation ¬, or universal quantification ∀.
Each such formula can be rewritten (efficiently) into an equivalent formula in prenex normal form, thus this form is usually simply assumed.
Thus conjunctive queries are of the following general form: with the free variables
As an example of why the restriction to domain independent first-order logic is important, consider
This formula cannot be implemented in the select-project-join fragment of relational algebra, and hence should not be considered a conjunctive query.
To give an example, imagine a relational database for storing information about students, their address, the courses they take and their gender.
Finding all male students and their addresses who attend a course that is also attended by a female student is expressed by the following conjunctive query: Note that since the only entity of interest is the male student and his address, these are the only distinguished variables, while the variables course, student2 are only existentially quantified, i.e. undistinguished.
Conjunctive queries where all variables are distinguished (and no variables are bound) are called equi-join queries,[1] because they are the equivalent, in the relational calculus, of the equi-join queries in the relational algebra (when selecting all columns of the result).
Conjunctive queries also correspond to select-project-join queries in relational algebra (i.e., relational algebra queries that do not use the operations union or difference) and to select-from-where queries in SQL in which the where-condition uses exclusively conjunctions of atomic equality conditions, i.e. conditions constructed from column names and constants using no comparison operators other than "=", combined using "and".
Many authors in fact prefer the following Datalog notation for the query above: Although there are no quantifiers in this notation, variables appearing in the head of the rule are still implicitly universally quantified, while variables only appearing in the body of the rule are still implicitly existentially quantified.
In fact, only single rules over extensional predicate symbols can be easily rewritten as an equivalent conjunctive query.
The problem of deciding whether for a given Datalog program there is an equivalent nonrecursive program (corresponding to a positive relational algebra query, or, equivalently, a formula of positive existential first-order logic, or, as a special case, a conjunctive query) is known as the Datalog boundedness problem and is undecidable.
[2] Extensions of conjunctive queries capturing more expressive power include: The formal study of all of these extensions is justified by their application in relational databases and is in the realm of database theory.
For the study of the computational complexity of evaluating conjunctive queries, two problems have to be distinguished.
The NP-hardness of conjunctive queries may appear surprising, since relational algebra and SQL strictly subsume the conjunctive queries and are thus at least as hard (in fact, relational algebra is PSPACE-complete with respect to combined complexity and is therefore even harder under widely held complexity-theoretic assumptions).
However, in the usual application scenario, databases are large, while queries are very small, and the data complexity model may be appropriate for studying and describing their difficulty.
The problem of listing all answers to a non-Boolean conjunctive query has been studied in the context of enumeration algorithms, with a characterization (under some computational hardness assumptions) of the queries for which enumeration can be performed with linear time preprocessing and constant delay between each solution.
Specifically, these are the acyclic conjunctive queries which also satisfy a free-connex condition.
, we write the result relation of evaluating the query on the instance simply as
[6] Since queries tend to be small, NP-completeness here is usually considered acceptable.
An important generalization of acyclicity is the notion of bounded hypertree-width, which is a measure of how close to acyclic a hypergraph is, analogous to bounded treewidth in graphs.
Conjunctive queries of bounded tree-width have LOGCFL combined complexity.
[10] Unrestricted conjunctive queries over tree data (i.e., a relational database consisting of a binary child relation of a tree as well as unary relations for labeling the tree nodes) have polynomial time combined complexity.