#

Learning something new everyday

So I’ve been learning a lot about SQL through PostgreSQL, since I pretty much use it everyday. I’ve stated this many times, but I really wish I learned more SQL in school, or at least knew about the powers of it. While in school the two internship jobs I had did not expose me to the world of relational databases and how useful they are. Sadly, there are many businesses out there that use Excel as their database and that’s all I really knew. On the flipside, I developed an expertise in Excel.

Upon picking up contracting work, and seeing what people out there would pay for. I realized that I wasn’t alone in not know much about storing data properly, and being able to use the data in an efficient manner. People were willing to pay for data to be scraped from a website, and then stored somewhere that they could access it easily. In this cloud era, it makes it really easy to keep all the data in the cloud, and then access it just as easily. This is what brought me to learning about SQL, and PostgreSQL. I chose Postgres because their slogan said it is “the world’s most advanced open source database”, I mean why not choose the world’s most advanced?

Fast forward a bit, through learning about the theory of relational databases, database normalization, setting up and accessing databases, and here I am still learning new stuff everyday. Organizing your data into tables in SQL comes mostly from experience and knowing how your data is suppose to look like. This usually means having primary keys in your tables that unique describe what trait the database table is expressing. You can also set constraints so that when you are inserting into the database, you don’t end up with a database that contains duplicates, or data it shouldn’t, like having a table of customers, and then a customer is duplicated and billed twice.

Unique constraints

We start off describing a database table in which we have a primary key of customer id, and a unique constraint on the information stored in the table.

CREATE TABLE customers (
customer_id serial PRIMARY KEY,
email text default NULL,
name text default NULL,
description text default NULL,
CONSTRAINT customer_uk UNIQUE (email, name, description)
)

Note that this is purely an example situation that I made up, I don’t have any tables that are described like this. The defaults for the email, name, and description entries into the table are NULL when nothing is entered. Sometimes you are left with the situation where NULLs are the correct choice.

INSERT INTO customers (email, name, description) VALUES ('test@tester.com', 'Tester', 'test description');
INSERT INTO customers (email, name, description) VALUES ('test@tester.com', NULL, NULL);
INSERT INTO customers (email, name, description) VALUES (NULL, 'i am null', NULL);
INSERT INTO customers (email, name, description) VALUES (NULL, 'i am null', NULL);

Looking at the insert statements along with the description of the table, you should be able to guess what is in the table. But wait, something doesn’t seem right…

SELECT email, name, description, count(*) FROM customers GROUP BY 1,2,3 HAVING count(*)>1;
 email |   name    | description | count
-------+-----------+-------------+-------
       | i am null |             |     2

Why is the insert of i am null allowed if the table has a unique constraint of email, name, description? Shouldn’t the fact that we entered two of the same records break this unique constraint? What a bizarre world SQL is.

Turns out this is the expected behavior of SQL, because a NULL index in a column can behave in two different ways. Sometimes a NULL is as you would expect, but sometimes a NULL doesn’t really exhibit the same behavior a value would. From this quick exercise, we can tell that a NULL isn’t the same as an empty string/value, otherwise we wouldn’t be allowed to insert those records. So a NULL is it’s own special snowflake.

SELECT NULL>NULL+1, CONCAT('test', NULL);
 ?column? | concat
----------+--------
          | test

So knowing this, this gives a caveat when creating unique keys for a table, that we should not allow a NULL to be used. A simple fix can be to have a default value of something that isn’t NULL, but sometimes this isn’t the best thing to do. We must remember that a NULL isn’t treated like a value, but it’s own distinct type where the behavior is different. Another way to think about this is that a NULL represents a lack of data, so we don’t actually know what that data could be. To confuse you even more, a NULL doesn’t equal NULL, since they’re not the same unknowns.

This behavior might not be the same across all flavours of SQL, but it’s present in PostgreSQL and MySQL. Other databases might treat NULLs differently.

comments powered by Disqus