*by Peter Routtier-Wone - Praxa*

**Introduction - The New Nulls**

Let me begin, as is customary with this topic, by apologising for dredging up such a hoary old saw. Nulls are ambiguous. They can represent any of four classes of meta information:

- TBA (to be advised)
- N/A (not applicable)
- UNK (unknown)
- DNC (do not care)

There is no way to determine which meaning is applicable. I have argued before that more than one meta-symbol is required - and that the classical, ambiguous null be outlawed - but I am no Chris Date or Ted Codd; nor did I offer a formal proof.

Nevertheless, I find it axiomatically self-evident that the
expression (*a* op *b*) where *op* is any comparative
operator cannot be determinate unless *a* and *b* are both
determinate. It follows thence that join-derived relations will also
be indeterminate, as a consequence of the indeterminacy of their join
condition.

**Determinacy in Relations**

This assertion is reflected in practice by the prohibition of nulls in primary keys. This is actually a negative statement of a requirement for determinacy in the primary key domain.

The enforcement of referential integrity constraints in foreign key
attributes is entirely an expression of a requirement of determinacy
in foreign keys. You might think that a determinacy requirement in
foreign keys would require only that the foreign key attribute be
non-null, but consider two relations, A and B and a relationship
R_{AB} defined in terms of PK_{A} and FK_{B}. The
universal set of meaningful values for elements of the domain FK_{B} is defined by the domain PK_{A}. Any value not expressed in the domain
PK_{A} is indeterminate in
meaning. Therefore, any relation attribute subject to such a
determinacy requirement (*ie* all foreign keys) is, by
implication, also constrained to entertain only values expressed in
the domain PK_{A}.

So there you have it - integrity enforcement as determinacy enforcement.

This is extremely desirable as it is structural rather than procedural, which means (a) it's harder to mess up, (b) requires no logical consideration at insert/update time - faster insert/update! - and (c) eliminates the meta-operators otherwise required for nulls. In addition, it becomes possible to forbid certain types of null while permitting others; for example an airline reservation record might well tolerate TBA for a flight date but cannot tolerate DNC.

If something is indeterminate, it has no defined meaning, and is therefore meaningless. This is intolerable in a primary key attribute, which accords with conventional relational wisdom.

I believe TBA, N/A, UNK and DNC cater for all cases in which NULL
might otherwise be justified.^{1}
Certainly they are determinate, which has the particular advantage
that comparisons become determinate without meta-operators like
*IS* (*eg* IS NULL).

This improves the orthogonality of existing operators. Comparative expressions will always be determinate. There will no longer be any requirement to resolve null ambiguity by context, leading to more straightforward and succinct formulation of comparative expressions.

One case, which will inevitably be raised, is that of the outer join. N/A is the appropriate meta-symbol (in my opinion) for unavailable relation attributes of the "outer" table.

**A Precedent**

For the comfort of readers who cannot ascribe credibility to any idea lacking a pedigree of acceptance, I can cite significant historical precedent for the enumeration of null meta-symbols. Zero is not a number. It is a determinate meta-symbol for arithmetic null, introduced to improve the orthogonality of the Arabic numeration model.

I invite to debunk the validity of the use of zero in numeration systems anyone insisting that such meta-symbols impose an unacceptable burden of conceptual complexity.

Meta-symbols do represent new conceptual complexity in one regard, but the (enormous) pay-off is reduced general complexity, due to the vastly smaller number of behavioural anomalies arising out of orthogonality exceptions.

The astute reader will have observed that while the replacement of null with TBA, N/A, UNK and DNC results in completely orthogonal behaviour in equivalence and non-equivalence operators ("equal" and "not equal") this is not the case for quantity relative comparators ("less than", "less than or equal to", "greater than or equal to", "greater than" and "like").

For example, (*a* op *b*) where *b* = TBA for any of
the above quantity relative comparators resolves to TBA,^{2} but this is a vast improvement over the
(present) alternative. Not only do we know with certainty *that*
we don't know the value of *b*, we also know *why* we don't
know the value of *b*, and that we *will* know the value of
*b*. This will undoubtedly prove (forgive me) invaluable.

In addition, there is case that the deficiencies lie not with the operand, but with the operator. This is also the case with zero; +, -, and × all work perfectly given zero as an operand, whereas ÷ does not. Is the problem with the meta-symbol zero or with the operation of division? This is a largely philosophical question; my point is merely that we should not discard the baby with the bath-water.

**Behaviour of Comparators With the New Nulls**

*Identity comparators*

There is really only one identity comparator, the equality
comparator "=" which is a truth valued function resolving to true when
its operators share common identity (*ie a* = *a* resolves
true).

For succinct expression is customary to support a complementary
inverse truth valued function inequality variously notated as "", "!=" or "<>" all of which are commonly
read as *not equals* or *does not equal* and which are
invariably logically equivalent to the logical negation of an equality
expression -

All of the new nulls are determinate, so it follows that a = a when a takes as its values any of the new nulls - the identities TBA = TBA, N/A = N/A, UNK = UNK and DNC = DNC are all truth valued functions. This necessarily holds for the complementary function inequality because it may for all cases be expressed as the logical negation of the determinate equality function.

*Relative value comparators*

These are also determinate, but are not truth valued where one or
both operands takes as its value any of the new nulls. Let *a* be
any value, let *b* be any of the new nulls, and let op be any of
the relative value comparators "less than", "less than or equal to",
"greater than or equal to", or "greater than", then *a op b* =
*b*.

This behaviour is commutative, thus under the same conditions, *(b
op a)* = *b* remains true.

*Boolean operators and the new nulls*

Boolean operators exhibit the same behaviour as relative value
comparators. Conditional (IF) statements need have no problem dealing
with logical expressions which resolve to one of the new nulls because
*all* values other than TRUE are covered by ELSE. This is valid provided
that the value of the expression is determinate, which *is* the case.

**Conclusion**

As (I hope) I have shown, the issue is one not of nulls but of
determinacy. Since there is certainly occasion for relation attributes
to represent the states TBA, N/A, UNK and DNC, and they are completely
determinate, I believe I may here conclude the case for replacing
ambiguous NULL with these four meta-symbols.

__________

^{1} Readers are invited to inform me of any cases not
satisfactorily so handled.
^{2} Unless a = TBA, in which case the expression resolves
to FALSE, but for obvious reasons this exception is unimportant in
practice.

March '97