NULL means "I don't have this data". It doesn't guarantee that the data doesn't exist anywhere.
It's semantically ambiguous, yes.
But it's not ambiguous that the database doesn't have the data. It's very clearly NULL.
If "middle name" is NULL, then the database itself can't tell you if a person doesn't have a middle name, or if it's merely not yet been told the middle name. You could differentiate the two by saying empty string for "has no middle name"… except on Oracle VARCHAR2, where NULL is equal to empty string.
> You could differentiate the two by saying empty string for "has no middle name"… except on Oracle VARCHAR2, where NULL is equal to empty string.
This is what kills me.
This seems like an obvious case where multiple flavors of NULL would make perfect sense. For example Ticket.CancellationDate isn't "unknown", it's this ticket is not cancelled. But SQL standard doesn't have a way to express "empty value" as distinct from "unknown value", so we're stuck working around the strange implications of using this square peg to fill a round hole.
Oracle's bizarre "empty VARCHAR2s are NULL" is strange but is consistent with every other datatype that offers no way to say "empty" but "NULL". 0 is not the same as NULL int and is not the same as empty int. Mindate is not the same as NULL date and is not the same as empty date.
Well, for the string case empty string is… the empty string.
Storing the exit code and stdout of a program run Oracle allows the state "it's not finished yet" to have exit code representation of NULL, but you cannot say the same for string.
"Unknown" is not a perfect map to NULL. "I have no answer for you" is better.
Maybe NULL values could use an annotation.
But I think that would make things worse. Like let's say you have some query that joins addresses with coordinates. You know that semantically any address without a coordinate has an "unknown" coordinate, not that it authoritatively doesn't have coordinates.
But not only do you need to change the data model then, but the query language too, to be able to express the difference where absense means one or the other.
Now absense is NULL, and leaves it up to the application to interpret.
SELECT a.address, b.coord FROM a LEFT OUTER JOIN (SELECT address,coord FROM coords WHERE is_test=FALSE) AS b ON a.address=b.address;
Or maybe some coordinates are missing because their locations no longer exist. The house has been demolished, so it's not "unknown" where the house was, it's just that it no longer has a location. And how would you join that data.
I don't think I've encountered a problem in practice with SQL in differentiating empty from unknown. It's out of scope of what SQL tries to solve, and that's fine.
That's fair - as I said in another comment, it breaks down with keys. I can't think of a great way to model "this row definitively does not have whatever is referenced in another table". Using anything other than NULL for that means you can't use proper foreign keys.
It's semantically ambiguous, yes.
But it's not ambiguous that the database doesn't have the data. It's very clearly NULL.
If "middle name" is NULL, then the database itself can't tell you if a person doesn't have a middle name, or if it's merely not yet been told the middle name. You could differentiate the two by saying empty string for "has no middle name"… except on Oracle VARCHAR2, where NULL is equal to empty string.