SQL: beware of NULL in WHERE NOT IN
If you have ever written some SQL you are probably aware of the special status of NULL in SQL.
To refresh your mind: NULL indicates absence of a value. You cannot use NULL in a comparison, it will always result in NULL. If this does not lead to the desired result there are special operators to deal with NULL explicitly. It behaves similar to NaN in floating point operations.
Example code:
[sourcecode language=”sql”]
CREATE TABLE pony
(
id INT PRIMARY KEY,
name VARCHAR(255)
);
INSERT INTO pony (id, name)
VALUES
(1, ‘Twilight Sparkle’),
(2, ‘Rainbow Dash’),
(3, ‘Pinkie Pie’),
(4, ‘Rarity’),
(5, ‘Applejack’);
SELECT * FROM pony;
— 5 rows as expected
SELECT * FROM pony
WHERE id = NULL;
— 0 rows as expected
SELECT * FROM pony
WHERE id != NULL;
— 0 rows, slight wtf
SELECT * FROM pony
WHERE id IS NOT NULL;
— 5 rows as expected
[/sourcecode]
NULL still works intuitively when using WHERE IN:
[sourcecode language=”sql”]
SELECT * FROM pony
WHERE id IN (1, 2, NULL);
— 2 rows as expected
— equivalent statement:
SELECT * FROM pony
WHERE id = 1
OR id = 2
OR id = null;
[/sourcecode]
WHERE NOT IN is where things get tricky:
[sourcecode language=”sql”]
SELECT * FROM pony
WHERE id NOT IN (1, 2, NULL);
— 0 rows, major wtf
[/sourcecode]
It makes sense if you split the clause into individual comparisons:
[sourcecode language=”sql”]
SELECT * FROM pony
WHERE NOT (
id = 1
OR id = 2
OR id = NULL
);
[/sourcecode]
And then remove the parenthesis using De Morgan’s laws:
[sourcecode language=”sql”]
SELECT * FROM pony
WHERE id != 1
AND id != 2
AND id != NULL;
[/sourcecode]
Like explained in the intro, id != NULL is always NULL, therefor the entire WHERE clause is always FALSE.
Practical advice and the subquery problem
If you are using external data, for example from XML or a CSV, you should filter out empty values.
More insidiously NULL can sneak in via a subquery. Lets say we want to know how many ponies are currently out of town:
[sourcecode language=”sql”]
WITH ponies_in_town AS (
SELECT pony.id, pony.name
FROM town
LEFT JOIN town_pony ON town
WHERE town.name = ‘Ponyville’
)
SELECT
COUNT(*) AS num_ponies_out_of_town
FROM pony
WHERE id NOT IN (
SELECT id FROM ponies_in_town
)
[/sourcecode]
If Applejack is alone in town, it says 4 ponies have left. If Applejack also leaves, it says 0 ponies have left. A fix for this is left as an exercise for the reader.
Subqueries are good
Let me emphasize this. The alternative to a subquery is to place table joins in the main query. This can lead to a more subtle insidious problem, which is duplicate rows. SQL using subqueries is more “modular” and therefore easier to understand and reuse. It can also be faster.
The example above as it stands does not demonstrate this but if we add a few more requirements we can demonstrate these benefits. So keep using WHERE NOT IN, just mind NULL.
But what if i actually want to take into account NULLS?
Make a separate where clause for null: WHERE val NOT IN (1, 2) OR val IS NULL
Author 02 mrt 2018
|
Looking for a job?
Geen Resultaten Gevonden
De pagina die u zocht kon niet gevonden worden. Probeer uw zoekopdracht te verfijnen of gebruik de bovenstaande navigatie om deze post te vinden.