Behavior of SQL Null
Learning something new everyday #
So I've been learning a lot about SQL through
PostgreSQL, since I pretty
much use it everyday. I've stated this many times, but I really wish I learned
more SQL in school, or at least knew about the powers of it. While in school
the two internship jobs I had did not expose me to the world of relational
databases and how useful they are. Sadly, there are many businesses out there
that use Excel as their database and that's all I really knew. On the
flipside, I developed an expertise in Excel.
Upon picking up contracting work, and seeing what people out there would pay
for. I realized that I wasn't alone in not know much about storing data
properly, and being able to use the data in an efficient manner. People were
willing to pay for data to be scraped from a website, and then stored
somewhere that they could access it easily. In this cloud era, it makes it
really easy to keep all the data in the cloud, and then access it just as
easily. This is what brought me to learning about SQL, and PostgreSQL. I chose
Postgres because their slogan said it is "the world's most advanced open
source database", I mean why not choose the world's most advanced?
Fast forward a bit, through learning about the theory of relational databases,
database normalization, setting up and accessing databases, and here I am
still learning new stuff everyday. Organizing your data into tables in SQL
comes mostly from experience and knowing how your data is suppose to look
like. This usually means having primary keys in your tables that unique
describe what trait the database table is expressing. You can also set
constraints so that when you are inserting into the database, you don't end up
with a database that contains duplicates, or data it shouldn't, like having a
table of customers, and then a customer is duplicated and billed twice.
Unique constraints #
We start off describing a database table in which we have a primary key of
customer id, and a unique constraint on the information stored in the table.
CREATE TABLE customers ( customer_id serial PRIMARY KEY, email text default NULL, name text default NULL, description text default NULL, CONSTRAINT customer_uk UNIQUE (email, name, description) )
Note that this is purely an example situation that I made up, I don't have any
tables that are described like this. The defaults for the email, name, and
description entries into the table are NULL when nothing is entered. Sometimes
you are left with the situation where NULLs are the correct choice.
INSERT INTO customers (email, name, description) VALUES ('firstname.lastname@example.org', 'Tester', 'test description'); INSERT INTO customers (email, name, description) VALUES ('email@example.com', NULL, NULL); INSERT INTO customers (email, name, description) VALUES (NULL, 'i am null', NULL); INSERT INTO customers (email, name, description) VALUES (NULL, 'i am null', NULL);
Looking at the insert statements along with the description of the table, you
should be able to guess what is in the table. But wait, something doesn't seem
SELECT email, name, description, count(*) FROM customers GROUP BY 1,2,3 HAVING count(*)>1; email | name | description | count -------+-----------+-------------+------- | i am null | | 2
Why is the insert of
i am null allowed if the table has a unique constraint
of email, name, description? Shouldn't the fact that we entered two of the
same records break this unique constraint? What a bizarre world SQL is.
Turns out this is the expected behavior of SQL, because a NULL index in a
column can behave in two different ways. Sometimes a NULL is as you would
expect, but sometimes a NULL doesn't really exhibit the same behavior a value
would. From this quick exercise, we can tell that a NULL isn't the same as an
empty string/value, otherwise we wouldn't be allowed to insert those records.
So a NULL is it's own special snowflake.
SELECT NULL>NULL+1, CONCAT('test', NULL); ?column? | concat ----------+-------- | test
So knowing this, this gives a caveat when creating unique keys for a table,
that we should not allow a NULL to be used. A simple fix can be to have a
default value of something that isn't NULL, but sometimes this isn't the best
thing to do. We must remember that a NULL isn't treated like a value, but it's
own distinct type where the behavior is different. Another way to think about
this is that a NULL represents a lack of data, so we don't actually know what
that data could be. To confuse you even more, a NULL doesn't equal NULL, since
they're not the same unknowns.
This behavior might not be the same across all flavours of SQL, but it's
present in PostgreSQL and MySQL. Other databases might treat NULLs