by Peter Routtier-Wone - Praxa
Introduction - The New Nulls
Let me begin, as is customary with this topic, by apologising for dredging up such a hoary old saw. Nulls are ambiguous. They can represent any of four classes of meta information:
There is no way to determine which meaning is applicable. I have argued before that more than one meta-symbol is required - and that the classical, ambiguous null be outlawed - but I am no Chris Date or Ted Codd; nor did I offer a formal proof.
Nevertheless, I find it axiomatically self-evident that the expression (a op b) where op is any comparative operator cannot be determinate unless a and b are both determinate. It follows thence that join-derived relations will also be indeterminate, as a consequence of the indeterminacy of their join condition.
Determinacy in Relations
This assertion is reflected in practice by the prohibition of nulls in primary keys. This is actually a negative statement of a requirement for determinacy in the primary key domain.
The enforcement of referential integrity constraints in foreign key attributes is entirely an expression of a requirement of determinacy in foreign keys. You might think that a determinacy requirement in foreign keys would require only that the foreign key attribute be non-null, but consider two relations, A and B and a relationship RAB defined in terms of PKA and FKB. The universal set of meaningful values for elements of the domain FKB is defined by the domain PKA. Any value not expressed in the domain PKA is indeterminate in meaning. Therefore, any relation attribute subject to such a determinacy requirement (ie all foreign keys) is, by implication, also constrained to entertain only values expressed in the domain PKA.
So there you have it - integrity enforcement as determinacy enforcement.
This is extremely desirable as it is structural rather than procedural, which means (a) it's harder to mess up, (b) requires no logical consideration at insert/update time - faster insert/update! - and (c) eliminates the meta-operators otherwise required for nulls. In addition, it becomes possible to forbid certain types of null while permitting others; for example an airline reservation record might well tolerate TBA for a flight date but cannot tolerate DNC.
If something is indeterminate, it has no defined meaning, and is therefore meaningless. This is intolerable in a primary key attribute, which accords with conventional relational wisdom.
I believe TBA, N/A, UNK and DNC cater for all cases in which NULL might otherwise be justified.1 Certainly they are determinate, which has the particular advantage that comparisons become determinate without meta-operators like IS (eg IS NULL).
This improves the orthogonality of existing operators. Comparative expressions will always be determinate. There will no longer be any requirement to resolve null ambiguity by context, leading to more straightforward and succinct formulation of comparative expressions.
One case, which will inevitably be raised, is that of the outer join. N/A is the appropriate meta-symbol (in my opinion) for unavailable relation attributes of the "outer" table.
For the comfort of readers who cannot ascribe credibility to any idea lacking a pedigree of acceptance, I can cite significant historical precedent for the enumeration of null meta-symbols. Zero is not a number. It is a determinate meta-symbol for arithmetic null, introduced to improve the orthogonality of the Arabic numeration model.
I invite to debunk the validity of the use of zero in numeration systems anyone insisting that such meta-symbols impose an unacceptable burden of conceptual complexity.
Meta-symbols do represent new conceptual complexity in one regard, but the (enormous) pay-off is reduced general complexity, due to the vastly smaller number of behavioural anomalies arising out of orthogonality exceptions.
The astute reader will have observed that while the replacement of null with TBA, N/A, UNK and DNC results in completely orthogonal behaviour in equivalence and non-equivalence operators ("equal" and "not equal") this is not the case for quantity relative comparators ("less than", "less than or equal to", "greater than or equal to", "greater than" and "like").
For example, (a op b) where b = TBA for any of the above quantity relative comparators resolves to TBA,2 but this is a vast improvement over the (present) alternative. Not only do we know with certainty that we don't know the value of b, we also know why we don't know the value of b, and that we will know the value of b. This will undoubtedly prove (forgive me) invaluable.
In addition, there is case that the deficiencies lie not with the operand, but with the operator. This is also the case with zero; +, -, and × all work perfectly given zero as an operand, whereas ÷ does not. Is the problem with the meta-symbol zero or with the operation of division? This is a largely philosophical question; my point is merely that we should not discard the baby with the bath-water.
Behaviour of Comparators With the New Nulls
There is really only one identity comparator, the equality comparator "=" which is a truth valued function resolving to true when its operators share common identity (ie a = a resolves true).
For succinct expression is customary to support a complementary inverse truth valued function inequality variously notated as "", "!=" or "<>" all of which are commonly read as not equals or does not equal and which are invariably logically equivalent to the logical negation of an equality expression -
All of the new nulls are determinate, so it follows that a = a when a takes as its values any of the new nulls - the identities TBA = TBA, N/A = N/A, UNK = UNK and DNC = DNC are all truth valued functions. This necessarily holds for the complementary function inequality because it may for all cases be expressed as the logical negation of the determinate equality function.
Relative value comparators
These are also determinate, but are not truth valued where one or both operands takes as its value any of the new nulls. Let a be any value, let b be any of the new nulls, and let op be any of the relative value comparators "less than", "less than or equal to", "greater than or equal to", or "greater than", then a op b = b.
This behaviour is commutative, thus under the same conditions, (b op a) = b remains true.
Boolean operators and the new nulls
Boolean operators exhibit the same behaviour as relative value comparators. Conditional (IF) statements need have no problem dealing with logical expressions which resolve to one of the new nulls because all values other than TRUE are covered by ELSE. This is valid provided that the value of the expression is determinate, which is the case.
As (I hope) I have shown, the issue is one not of nulls but of
determinacy. Since there is certainly occasion for relation attributes
to represent the states TBA, N/A, UNK and DNC, and they are completely
determinate, I believe I may here conclude the case for replacing
ambiguous NULL with these four meta-symbols.
1 Readers are invited to inform me of any cases not satisfactorily so handled.
2 Unless a = TBA, in which case the expression resolves to FALSE, but for obvious reasons this exception is unimportant in practice.