Null (SQL)

Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory.

This usage is quite different from most programming languages, where a null value of a reference means it is not pointing to any object.

E. F. Codd mentioned nulls as a method of representing missing data in the relational model in a 1975 paper in the FDT Bulletin of ACM-SIGMOD.

[1] Codd later reinforced his requirement that all RDBMSs support Null to indicate missing data in a 1985 two-part article published in Computerworld magazine.

[2][3] The 1986 SQL standard basically adopted Codd's proposal after an implementation prototype in IBM System R. Although Don Chamberlin recognized nulls (alongside duplicate rows) as one of the most controversial features of SQL, he defended the design of Nulls in SQL invoking the pragmatic arguments that it was the least expensive form of system support for missing information, saving the programmer from many duplicative application-level checks (see semipredicate problem) while at the same time providing the database designer with the option not to use Nulls if they so desire; for example, to avoid well-known anomalies (discussed in the semantics section of this article).

Finally, he argued that in practice Nulls also end up being used as a quick way to patch an existing schema when it needs to evolve beyond its original intent, coding not for missing but rather for inapplicable information; for example, a database that quickly needs to support electric cars while having a miles-per-gallon column.

In Codd's book, these two Null-type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively.

Because of this additional complexity, the idea of multiple Nulls with different definitions has not gained widespread acceptance in the database practitioners' domain.

[6] Though this behavior is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly.

Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERT, UPDATE, or DELETE DML statements, and are discarded by SELECT queries.

[12] The SQL standard contains the optional feature F571 "Truth value tests" that introduces three additional logical unary operators (six in fact, if we count their negation, which is part of their syntax), also using postfix notation.

In classical two-valued logic, the law of the excluded middle would allow the simplification of the WHERE clause predicate, in fact its elimination.

The second query is actually equivalent with: Thus, to correctly simplify the first statement in SQL requires that we return all rows in which x is not null.

Oracle's dialect of SQL provides a built-in function DECODE which can be used instead of the simple CASE expressions and considers two nulls equal.

More generally, it was shown by Imielinski and Lipski that Codd tables are a weak representation system if the query language is restricted to projections, selections (and renaming of columns).

The "forgetful" nature of the two sub-queries was all that it took to guarantee that some sure information went unreported when the above query was run on the Codd table Emp.

However, it is still not a complete solution for incomplete information in the sense that v-tables are only a weak representation for queries not using any negations in selection (and not using any set difference either).

A check constraint placed on a column operates under a slightly different set of rules than those for the DML WHERE clause.

The result set generated by this query demonstrates how SQL uses Null as a placeholder for values missing from the right-hand (PhoneNumber) table, as shown below.

Calculating the minimum or maximum value of an empty set is impossible, so those results must be NULL, indicating there is no answer.

For example, Microsoft SQL Server documentation states the following:[24] For indexing purposes, NULLs compare as equal.

The NULL literal is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific data type.

The ISO SQL:1999 standard introduced the BOOLEAN data type to SQL, however it's still just an optional, non-core feature, coded T031.

[28][29] The Boolean type has been subject of criticism, particularly because of the mandated behavior of the UNKNOWN literal, which is never equal to itself because of the identification with NULL.

[31] The procedural part of Oracle's PL/SQL however supports BOOLEAN variables; these can also be assigned NULL and the value is considered the same as UNKNOWN.

It is often incorrectly assumed that a missing value would be "less than" or "not equal to" a constant if that field contains Null, but, in fact, such expressions return Unknown.

In The Relational Model for Database Management: Version 2, Codd suggested that the SQL implementation of Null was flawed and should be replaced by two distinct Null-type markers.

[5] Others have suggested adding additional Null-type markers to Codd's recommendation to indicate even more reasons that a data value might be "Missing", increasing the complexity of SQL's logic system.

Because of the complexity of the Null-handling and logic systems required to support multiple Null markers, none of these proposals have gained widespread acceptance.

[35] Others, like author Fabian Pascal, have stated a belief that "how the function calculation should treat missing values is not governed by the relational model.

The Greek lowercase omega (ω) character is used to represent Null in database theory .