SQL Query Options

PostgreSQL #

Everytime I discover something new about Postgres, or just use it in general,
I wonder why I didn't learn database stuff earlier. It has been by far the
most useful skill I've been learning recently, and although these days there
are many services available that make things easier, knowing SQL is a very
useful tool.

That said, in SQL there are often many different ways to obtain the same
result. To me, this is like solving a puzzle multiple ways, and I've always
enjoyed that analysis. I'll present without context 3 queries that all output
the same things.

    SELECT distinct distributor_price_category_code 
    FROM icp_detail
    SELECT distinct pricing_code
    FROM vector_tariff
    SELECT distinct distributor_price_category_code 
    FROM icp_detail a
        distributor_price_category_code NOT IN (SELECT distinct pricing_code
    FROM vector_tariff b)
    SELECT DISTINCT distributor_price_category_code 
    FROM icp_detail a
    FROM vector_tariff b 
    WHERE a.distributor_price_category_code = b.pricing_code)

As presented above, the first query makes use of the EXCEPT function in
Postgres, which allows you to essentially select two seperate sets, and then
take away those that are in the 2nd set.

The second query obtains the same result using a NOT IN, and it selects a
column and compares it against another selected column.

The last query uses a NOT EXISTS that takes a selected set where the two
columns are equal. The last query isn't as straightforward, and you might
think that

Which one is best? #

That's always the question that you will want to ask, especially if
performance is important. Obviously there will be a query that is faster than
the others, but how do you determine that?

The answer is to use EXPLAIN ANALYZE which is a Postgres tool that explains
how the query is run, and then it will also run it to determine how long the
query takes. EXPLAIN will display the query plan, and then ANALYZE will
run the query and tell you how long planning took, andhow long execution took.

Based on the 3 queries above, the 2nd query was the fastest for my situation.
It is important to note that the time depends on a number of factors, such as
complexity of the tables, size of the tables, and the information just to name
a few. As tables grow in size, it is very possible that using one query will
become slower than another option. I'll save it for another post, but reading
into the query plan from EXPLAIN is a useful way to understand how Postgres
breaks down the query, and then finds you the information you want.

← Home