Retrieving Values from One Table That Do Not Exist in Another

Problem

You wish to find those values in one table, call it the source table, that do not also exist in some target table. For example, you want to find which departments (if any) in table DEPT do not exist in table EMP. In the example data, DEPTNO 40 from table DEPT does not exist in table EMP, so the result set should be the following:

       DEPTNO
----------
40

Solution

Having functions that perform set difference is particularly useful for this problem. DB2, PostgreSQL, and Oracle support set difference operations. If your DBMS does not support a set difference function, use a subquery as shown for MySQL and SQL Server.

DB2 and PostgreSQL

Use the set operation EXCEPT:

 1 select deptno from dept
2 except
3 select deptno from emp

Oracle

Use the set operation MINUS:

 1 select deptno from dept
2 minus
3 select deptno from emp

MySQL and SQL Server

Use a subquery to return all DEPTNOs from table EMP into an outer query that searches table DEPT for rows that are not amongst the rows returned from the subquery:

 1 select deptno
2 from dept
3 where deptno not in (select deptno from emp)

Discussion

DB2 and PostgreSQL

The built-in functions provided by DB2 and PostgreSQL make this operation quite easy. The EXCEPT operator takes the first result set and removes from it all rows found in the second result set. The operation is very much like a subtraction.

There are restrictions on the use of set operators, including EXCEPT. Data types and number of values to compare must match in both SELECT lists. Additionally, EXCEPT will not return duplicates and, unlike a subquery using NOT IN, NULLs do not present a problem (see the discussion for MySQL and SQL Server). The EXCEPT operator will return rows from the upper query (the query before the EXCEPT) that do not exist in the lower query (the query after the EXCEPT).

Oracle

The Oracle solution is identical to that for DB2 and PostgreSQL, except that Oracle calls its set difference operator MINUS rather than EXCEPT. Otherwise, the preceding explanation applies to Oracle as well.

MySQL and SQL Server

The subquery will return all DEPTNOs from table EMP. The outer query returns all DEPTNOs from table DEPT that are "not in" or "not included in" the result set returned from the subquery.

Duplicate elimination is something you'll want to consider when using the MySQL and SQL Server solutions. The EXCEPT- and MINUS-based solutions used for the other platforms eliminate duplicate rows from the result set, ensuring that each DEPTNO is reported only one time. Of course, that can only be the case anyway, as DEPTNO is a key field in my example data. Were DEPTNO not a key field, you could use DISTINCT as follows to ensure that each DEPTNO value missing from EMP is reported only once:

 select distinct deptno
from dept
where deptno not in (select deptno from emp)

Be mindful of NULLs when using NOT IN. Consider the following table, NEW_ DEPT:

 create table new_dept(deptno integer)
insert into new_dept values (10)
insert into new_dept values (50)
insert into new_dept values (null)

If you try to find the DEPTNOs in table DEPT that do not exist in table NEW_DEPT and use a subquery with NOT IN, you'll find that the query returns no rows:

 select *
from dept
where deptno not in (select deptno from new_dept)

DEPTNOs 20, 30, and 40 are not in table NEW_DEPT, yet were not returned by the query. The reason is the NULL value present in table NEW_DEPT. Three rows are returned by the subquery, with DEPTNOs of 10, 50, and NULL. IN and NOT IN are essentially OR operations, and will yield different results because of how NULL values are treated by logical OR evaluations. Consider the following example using IN and its equivalent using OR:

 
select deptno
from dept
where deptno in ( 10,50,null )


DEPTNO
-------
10


select deptno
from dept
where (deptno=10 or deptno=50 or deptno=null)


DEPTNO
-------
10

Now consider the same example using NOT IN and NOT OR:

 
select deptno
from dept
where deptno not in ( 10,50,null )

( no rows )

select deptno
from dept
where not (deptno=10 or deptno=50 or deptno=null)


(no rows)

As you can see, the condition DEPTNO NOT IN (10, 50, NULL) equates to:

 not (deptno=10 or deptno=50 or deptno=null)

In the case where DEPTNO is 50, here's how this expression plays out:

 not (deptno=10 or deptno=50 or deptno=null)
(false or false or null)
(false or null)
null

In SQL, "TRUE or NULL" is TRUE, but "FALSE or NULL" is NULL! And once you have a NULL result, you'll continue to have NULL result (unless you specifically test for NULL using a technique like that shown in Recipe 1.11). You must keep this in mind when using IN predicates and when performing logical OR evaluations, and NULL values are involved.

To avoid the problem with NOT IN and NULLs, use a correlated subquery in conjunction with NOT EXISTS. The term "correlated subquery" is used because rows from the outer query are referenced in the subquery. The following example is an alternative solution that will not be affected by NULL rows (going back to the original query from the "Problem" section):

 
select d.deptno
from dept d
where not exists ( select null
from emp e
where d.deptno = e.deptno )


DEPTNO
----------
40

Conceptually, the outer query in this solution considers each row in the DEPT table. For each DEPT row, the following happens:

  1. The subquery is executed to see whether the department number exists in the EMP table. Note the condition D.DEPTNO = E.DEPTNO, which brings together the department numbers from the two tables.

  2. If the subquery returns results, then EXISTS (…) evaluates to true and NOT EXISTS (…) thus evaluates to FALSE, and the row being considered by the outer query is discarded.

  3. If the subquery returns no results, then NOT EXISTS (…) evaluates to TRUE, and the row being considered by the outer query is returned (because it is for a department not represented in the EMP table).

The items in the SELECT list of the subquery are unimportant when using a correlated subquery with EXISTS/NOT EXISTS, which is why I chose to select NULL, to force you to focus on the join in the subquery rather than the items in the SELECT list.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

0 Comments: