Determining Whether Two Tables Have the Same Data - SQL

Problem

You want to know if two tables or views have the same data (cardinality and values). Consider the following view:

 
create view V
as
select * from emp where deptno != 10
union all
select * from emp where ename = 'WARD'

select * from V


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
----- ---------- --------- ----- ----------- ----- ----- ------
7369 SMITH CLERK 7902 17-DEC-1980 800 20
7499 ALLEN SALESMAN 7698 20-FEB-1981 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30
7566 JONES MANAGER 7839 02-APR-1981 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1300 30
7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30
7788 SCOTT ANALYST 7566 09-DEC-1982 3000 20
7844 TURNER SALESMAN 7698 08-SEP-1981 1500 0 30
7876 ADAMS CLERK 7788 12-JAN-1983 1100 20
7900 JAMES CLERK 7698 03-DEC-1981 950 30
7902 FORD ANALYST 7566 03-DEC-1981 3000 20
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30

You want to determine whether or not this view has exactly the same data as table EMP. The row for employee "WARD" is duplicated to show that the solution will reveal not only different data but duplicates as well. Based on the rows in table EMP the difference will be the three rows for employees in department 10 and the two rows for employee "WARD". You want to return the following result set:

 EMPNO ENAME       JOB         MGR  HIREDATE      SAL   COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2
7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1
7839 KING PRESIDENT 17-NOV-1981 5000 10 1
7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1

Solution

Functions that perform SET difference (MINUS or EXCEPT, depending on your DBMS) make the problem of comparing tables a relatively easy one to solve. If your DBMS does not offer such functions, you can use a correlated subquery.

DB2 and PostgreSQL

Use the set operations EXCEPT and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V:

  1  (
2 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
3 count(*) as cnt
4 from V
5 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
6 except
7 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
8 count(*) as cnt
9 from emp
10 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
11 )
12 union all
13 (
14 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
15 count(*) as cnt
16 from emp
17 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
18 except
19 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
20 count(*) as cnt
21 from v
22 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
23 )

Oracle

Use the set operations MINUS and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V:

  1  (
2 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
3 count(*) as cnt
4 from V
5 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
6 minus
7 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
8 count(*) as cnt
9 from emp
10 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
11 )
12 union all
13 (
14 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
15 count(*) as cnt
16 from emp
17 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
18 minus
19 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
20 count(*) as cnt
21 from v
22 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
23 )

MySQL and SQL Server

Use a correlated subquery and UNION ALL to find the rows in view V and not in table EMP combined with the rows in table EMP and not in view V:

  1  select *
2 from (
3 select e.empno,e.ename,e.job,e.mgr,e.hiredate,
4 e.sal,e.comm,e.deptno, count(*) as cnt
5 from emp e
6 group by empno,ename,job,mgr,hiredate,
7 sal,comm,deptno
8 ) e
9 where not exists (
10 select null
11 from (
12 select v.empno,v.ename,v.job,v.mgr,v.hiredate,
13 v.sal,v.comm,v.deptno, count(*) as cnt
14 from v
15 group by empno,ename,job,mgr,hiredate,
16 sal,comm,deptno
17 ) v
18 where v.empno = e.empno
19 and v.ename = e.ename
20 and v.job = e.job
21 and v.mgr = e.mgr
22 and v.hiredate = e.hiredate
23 and v.sal = e.sal
24 and v.deptno = e.deptno
25 and v.cnt = e.cnt
26 and coalesce(v.comm,0) = coalesce(e.comm,0)
27 )
28 union all
29 select *
30 from (
31 select v.empno,v.ename,v.job,v.mgr,v.hiredate,
32 v.sal,v.comm,v.deptno, count(*) as cnt
33 from v
34 group by empno,ename,job,mgr,hiredate,
35 sal,comm,deptno
36 ) v
37 where not exists (
38 select null
39 from (
40 select e.empno,e.ename,e.job,e.mgr,e.hiredate,
41 e.sal,e.comm,e.deptno, count(*) as cnt
42 from emp e
43 group by empno,ename,job,mgr,hiredate,
44 sal,comm,deptno
45 ) e
46 where v.empno = e.empno
47 and v.ename = e.ename
48 and v.job = e.job
49 and v.mgr = e.mgr
50 and v.hiredate = e.hiredate
51 and v.sal = e.sal
52 and v.deptno = e.deptno
53 and v.cnt = e.cnt
54 and coalesce(v.comm,0) = coalesce(e.comm,0)
55 )

Discussion

Despite using different techniques, the concept is the same for all solutions:

  1. First, find rows in table EMP that do not exist in view V.

  2. Then combine (UNION ALL) those rows with rows from view V that do not exist in table EMP.

If the tables in question are equal, then no rows are returned. If the tables are different, the rows causing the difference are returned. As an easy first step when comparing tables, you can compare the cardinalities alone rather than including them with the data comparison. The following query is a simple example of this and will work on all DBMSs:

 
select count(*)
from emp
union
select count(*)
from dept


COUNT(*)
--------
4
14

Because UNION will filter out duplicates, only one row will be returned if the tables' cardinalities are the same. Because two rows are returned in this example, you know that the tables do not contain identical rowsets.

DB2, Oracle, and PostgreSQL

MINUS and EXCEPT work in the same way, so I will use EXCEPT for this discussion. The queries before and after the UNION ALL are very similar. So, to understand how the solution works, simply execute the query prior to the UNION ALL by itself. The following result set is produced by executing lines 111 in the solution section:

 
(
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from V
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
except
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from emp
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2

The result set represents a row found in view V that is either not in table EMP or has a different cardinality than that same row in table EMP. In this case, the duplicate row for employee "WARD" is found and returned. If you're still having trouble understanding how the result set is produced, run each query on either side of EXCEPT individually. You'll notice the only difference between the two result sets is the CNT for employee "WARD" returned by view V.

The portion of the query after the UNION ALL does the opposite of the query preceding UNION ALL. The query returns rows in table EMP not in view V:

 
(
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from emp
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
minus
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from v
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1
7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1
7839 KING PRESIDENT 17-NOV-1981 5000 10 1
7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1

The results are then combined by UNION ALL to produce the final result set.

MySQL and SQL Server

The queries before and after the UNION ALL are very similar. To understand how the subquery-based solution works, simply execute the query prior to the UNION ALL by itself. The query below is from lines 127 in the solution:

 
select *
from (
select e.empno,e.ename,e.job,e.mgr,e.hiredate,
e.sal,e.comm,e.deptno, count(*) as cnt
from emp e
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) e
where not exists (
select null
from (
select v.empno,v.ename,v.job,v.mgr,v.hiredate,
v.sal,v.comm,v.deptno, count(*) as cnt
from v
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) v
where v.empno = e.empno
and v.ename = e.ename
and v.job = e.job
and v.mgr = e.mgr
and v.hiredate = e.hiredate
and v.sal = e.sal
and v.deptno = e.deptno
and v.cnt = e.cnt
and coalesce(v.comm,0) = coalesce(e.comm,0)
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1
7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1
7839 KING PRESIDENT 17-NOV-1981 5000 10 1
7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1

Notice that the comparison is not between table EMP and view V, but rather between inline view E and inline view V. The cardinality for each row is found and returned as an attribute for that row. You are comparing each row and its occurrence count. If you are having trouble understanding how the comparison works, run the subqueries independently. The next step is to find all rows (including CNT) in inline view E that do not exist in inline view V. The comparison uses a correlated subquery and NOT EXISTS. The joins will determine which rows are the same, and the result will be all rows from inline view E that are not the rows returned by the join. The query after the UNION ALL does the opposite; it finds all rows in inline view V that do not exist in inline view E:

 
select *
from (
select v.empno,v.ename,v.job,v.mgr,v.hiredate,
v.sal,v.comm,v.deptno, count(*) as cnt
from v
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) v
where not exists (
select null
from (
select e.empno,e.ename,e.job,e.mgr,e.hiredate,
e.sal,e.comm,e.deptno, count(*) as cnt
from emp e
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) e
where v.empno = e.empno
and v.ename = e.ename
and v.job = e.job
and v.mgr = e.mgr
and v.hiredate = e.hiredate
and v.sal = e.sal
and v.deptno = e.deptno
and v.cnt = e.cnt
and coalesce(v.comm,0) = coalesce(e.comm,0)
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2

The results are then combined by UNION ALL to produce the final result set.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Adding Joins to a Query Without Interfering with Other Joins - SQL

Problem

You have a query that returns the results you want. You need additional information, but when trying to get it, you lose data from the original result set. For example, you want to return all employees, the location of the department in which they work, and the date they received a bonus. For this problem, the EMP_BONUS table contains the following data:

 
select * from emp_bonus


EMPNO RECEIVED TYPE
---------- ----------- ----------
7369 14-MAR-2005 1
7900 14-MAR-2005 2
7788 14-MAR-2005 3

The query you start with looks like this:

 
select e.ename, d.loc
from emp e, dept d
where e.deptno=d.deptno


ENAME LOC
---------- -------------
SMITH DALLAS
ALLEN CHICAGO
WARD CHICAGO
JONES DALLAS
MARTIN CHICAGO
BLAKE CHICAGO
CLARK NEW YORK
SCOTT DALLAS
KING NEW YORK
TURNER CHICAGO
ADAMS DALLAS
JAMES CHICAGO
FORD DALLAS
MILLER NEW YORK

You want to add to these results the date a bonus was given to an employee, but joining to the EMP_BONUS table returns fewer rows than you wish because not every employee has a bonus:

 
select e.ename, d.loc,eb.received
from emp e, dept d, emp_bonus eb
where e.deptno=d.deptno
and e.empno=eb.empno


ENAME LOC RECEIVED
---------- ------------- -----------
SCOTT DALLAS 14-MAR-2005
SMITH DALLAS 14-MAR-2005
JAMES CHICAGO 14-MAR-2005

Your desired result set is the following:

 ENAME       LOC            RECEIVED
---------- ------------- -----------
ALLEN CHICAGO
WARD CHICAGO
MARTIN CHICAGO
JAMES CHICAGO 14-MAR-2005
TURNER CHICAGO
BLAKE CHICAGO
SMITH DALLAS 14-MAR-2005
FORD DALLAS
ADAMS DALLAS
JONES DALLAS
SCOTT DALLAS 14-MAR-2005
CLARK NEW YORK
KING NEW YORK
MILLER NEW YORK

Solution

You can use an outer join to obtain the additional information without losing the data from the original query. First join table EMP to table DEPT to get all employees and the location of the department they work, then outer join to table EMP_ BONUS to return the date of the bonus if there is one. Following is the DB2, MySQL, PostgreSQL, and SQL Server syntax:

 1 select e.ename, d.loc, eb.received
2 from emp e join dept d
3 on (e.deptno=d.deptno)
4 left join emp_bonus eb
5 on (e.empno=eb.empno)
6 order by 2
If you are using Oracle9i Database or later, the preceding solution will work for you. Alternatively, you can use Oracle's proprietary outer-join syntax, which is your only choice when using Oracle8i Database and earlier:
 1 select e.ename, d.loc, eb.received
2 from emp e, dept d, emp_bonus eb
3 where e.deptno=d.deptno
4 and e.empno=eb.empno (+)
5 order by 2
You can also use a scalar subquery (a subquery placed in the SELECT list) to mimic an outer join:
 1 select e.ename, d.loc,
2 (select eb.received from emp_bonus eb
3 where eb.empno=e.empno) as received
4 from emp e, dept d
5 where e.deptno=d.deptno
6 order by 2
The scalar subquery solution will work across all platforms.

Discussion

An outer join will return all rows from one table and matching rows from another. See the previous recipe for another example of such a join. The reason an outer join works to solve this problem is that it does not result in any rows being eliminated that would otherwise be returned. The query will return all the rows it would return without the outer join. And it also returns the received date, if one exists.

Use of a scalar subquery is also a convenient technique for this sort of problem, as it does not require you to modify already correct joins in your main query. Using a scalar subquery is an easy way to tack on extra data to a query without compromising the current result set. When working with scalar subqueries, you must ensure they return a scalar (single) value. If a subquery in the SELECT list returns more than one row, you will receive an error.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Retrieving Rows from One Table That Do Not Correspond to Rows in Another - SQL

Problem

You want to find rows that are in one table that do not have a match in another table, for two tables that have common keys. For example, you want to find which departments have no employees. The result set should be the following:

       DEPTNO  DNAME           LOC
---------- -------------- -------------
40 OPERATIONS BOSTON

Finding the department each employee works in requires an equi-join on DEPTNO from EMP to DEPT. The DEPTNO column represents the common value between tables. Unfortunately, an equi-join will not show you which department has no employees. That's because by equi-joining EMP and DEPT you are returning all rows that satisfy the join condition. Instead you want only those rows from DEPT that do not satisfy the join condition.

This is a subtly different problem than in the preceding recipe, though at first glance they may seem the same. The difference is that the preceding recipe yields only a list of department numbers not represented in table EMP. Using this recipe, however, you can easily return other columns from the DEPT table; you can return more than just department numbers.

Solution

Return all rows from one table along with rows from another that may or may not have a match on the common column. Then, keep only those rows with no match.

DB2, MySQL, PostgreSQL, SQL Server

Use an outer join and filter for NULLs (keyword OUTER is optional):

 1 select d.*
2 from dept d left outer join emp e
3 on (d.deptno = e.deptno)
4 where e.deptno is null

Oracle

For users on Oracle9i Database and later, the preceding solution will work. Alternatively, you can use the proprietary Oracle outer-join syntax:

 1 select d.*
2 from dept d, emp e
3 where d.deptno = e.deptno (+)
4 and e.deptno is null

This proprietary syntax (note the use of the "+" in parens) is the only outer-join syntax available in Oracle8i Database and earlier.

Discussion

This solution works by outer joining and then keeping only rows that have no match. This sort of operation is sometimes called an anti-join. To get a better idea of how an anti-join works, first examine the result set without filtering for NULLs:

 
select e.ename, e.deptno as emp_deptno, d.*
from dept d left join emp e
on (d.deptno = e.deptno)


ENAME EMP_DEPTNO DEPTNO DNAME LOC
---------- ---------- ---------- -------------- -------------
SMITH 20 20 RESEARCH DALLAS
ALLEN 30 30 SALES CHICAGO
WARD 30 30 SALES CHICAGO
JONES 20 20 RESEARCH DALLAS
MARTIN 30 30 SALES CHICAGO
BLAKE 30 30 SALES CHICAGO
CLARK 10 10 ACCOUNTING NEW YORK
SCOTT 20 20 RESEARCH DALLAS
KING 10 10 ACCOUNTING NEW YORK
TURNER 30 30 SALES CHICAGO
ADAMS 20 20 RESEARCH DALLAS
JAMES 30 30 SALES CHICAGO
FORD 20 20 RESEARCH DALLAS
MILLER 10 10 ACCOUNTING NEW YORK
40 OPERATIONS BOSTON

Notice, the last row has a NULL value for EMP.ENAME and EMP_DEPTNO. That's because no employees work in department 40. The solution uses the WHERE clause to keep only rows where EMP_DEPTNO is NULL (thus keeping only rows from DEPT that have no match in EMP).


** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Retrieving Values from One Table That Do Not Exist in Another

Problem

You wish to find those values in one table, call it the source table, that do not also exist in some target table. For example, you want to find which departments (if any) in table DEPT do not exist in table EMP. In the example data, DEPTNO 40 from table DEPT does not exist in table EMP, so the result set should be the following:

       DEPTNO
----------
40

Solution

Having functions that perform set difference is particularly useful for this problem. DB2, PostgreSQL, and Oracle support set difference operations. If your DBMS does not support a set difference function, use a subquery as shown for MySQL and SQL Server.

DB2 and PostgreSQL

Use the set operation EXCEPT:

 1 select deptno from dept
2 except
3 select deptno from emp

Oracle

Use the set operation MINUS:

 1 select deptno from dept
2 minus
3 select deptno from emp

MySQL and SQL Server

Use a subquery to return all DEPTNOs from table EMP into an outer query that searches table DEPT for rows that are not amongst the rows returned from the subquery:

 1 select deptno
2 from dept
3 where deptno not in (select deptno from emp)

Discussion

DB2 and PostgreSQL

The built-in functions provided by DB2 and PostgreSQL make this operation quite easy. The EXCEPT operator takes the first result set and removes from it all rows found in the second result set. The operation is very much like a subtraction.

There are restrictions on the use of set operators, including EXCEPT. Data types and number of values to compare must match in both SELECT lists. Additionally, EXCEPT will not return duplicates and, unlike a subquery using NOT IN, NULLs do not present a problem (see the discussion for MySQL and SQL Server). The EXCEPT operator will return rows from the upper query (the query before the EXCEPT) that do not exist in the lower query (the query after the EXCEPT).

Oracle

The Oracle solution is identical to that for DB2 and PostgreSQL, except that Oracle calls its set difference operator MINUS rather than EXCEPT. Otherwise, the preceding explanation applies to Oracle as well.

MySQL and SQL Server

The subquery will return all DEPTNOs from table EMP. The outer query returns all DEPTNOs from table DEPT that are "not in" or "not included in" the result set returned from the subquery.

Duplicate elimination is something you'll want to consider when using the MySQL and SQL Server solutions. The EXCEPT- and MINUS-based solutions used for the other platforms eliminate duplicate rows from the result set, ensuring that each DEPTNO is reported only one time. Of course, that can only be the case anyway, as DEPTNO is a key field in my example data. Were DEPTNO not a key field, you could use DISTINCT as follows to ensure that each DEPTNO value missing from EMP is reported only once:

 select distinct deptno
from dept
where deptno not in (select deptno from emp)

Be mindful of NULLs when using NOT IN. Consider the following table, NEW_ DEPT:

 create table new_dept(deptno integer)
insert into new_dept values (10)
insert into new_dept values (50)
insert into new_dept values (null)

If you try to find the DEPTNOs in table DEPT that do not exist in table NEW_DEPT and use a subquery with NOT IN, you'll find that the query returns no rows:

 select *
from dept
where deptno not in (select deptno from new_dept)

DEPTNOs 20, 30, and 40 are not in table NEW_DEPT, yet were not returned by the query. The reason is the NULL value present in table NEW_DEPT. Three rows are returned by the subquery, with DEPTNOs of 10, 50, and NULL. IN and NOT IN are essentially OR operations, and will yield different results because of how NULL values are treated by logical OR evaluations. Consider the following example using IN and its equivalent using OR:

 
select deptno
from dept
where deptno in ( 10,50,null )


DEPTNO
-------
10


select deptno
from dept
where (deptno=10 or deptno=50 or deptno=null)


DEPTNO
-------
10

Now consider the same example using NOT IN and NOT OR:

 
select deptno
from dept
where deptno not in ( 10,50,null )

( no rows )

select deptno
from dept
where not (deptno=10 or deptno=50 or deptno=null)


(no rows)

As you can see, the condition DEPTNO NOT IN (10, 50, NULL) equates to:

 not (deptno=10 or deptno=50 or deptno=null)

In the case where DEPTNO is 50, here's how this expression plays out:

 not (deptno=10 or deptno=50 or deptno=null)
(false or false or null)
(false or null)
null

In SQL, "TRUE or NULL" is TRUE, but "FALSE or NULL" is NULL! And once you have a NULL result, you'll continue to have NULL result (unless you specifically test for NULL using a technique like that shown in Recipe 1.11). You must keep this in mind when using IN predicates and when performing logical OR evaluations, and NULL values are involved.

To avoid the problem with NOT IN and NULLs, use a correlated subquery in conjunction with NOT EXISTS. The term "correlated subquery" is used because rows from the outer query are referenced in the subquery. The following example is an alternative solution that will not be affected by NULL rows (going back to the original query from the "Problem" section):

 
select d.deptno
from dept d
where not exists ( select null
from emp e
where d.deptno = e.deptno )


DEPTNO
----------
40

Conceptually, the outer query in this solution considers each row in the DEPT table. For each DEPT row, the following happens:

  1. The subquery is executed to see whether the department number exists in the EMP table. Note the condition D.DEPTNO = E.DEPTNO, which brings together the department numbers from the two tables.

  2. If the subquery returns results, then EXISTS (…) evaluates to true and NOT EXISTS (…) thus evaluates to FALSE, and the row being considered by the outer query is discarded.

  3. If the subquery returns no results, then NOT EXISTS (…) evaluates to TRUE, and the row being considered by the outer query is returned (because it is for a department not represented in the EMP table).

The items in the SELECT list of the subquery are unimportant when using a correlated subquery with EXISTS/NOT EXISTS, which is why I chose to select NULL, to force you to focus on the join in the subquery rather than the items in the SELECT list.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Finding Rows in Common Between Two Tables

Problem

You want to find common rows between two tables but there are multiple columns on which you can join. For example, consider the following view V:

 
create view V
as
select ename,job,sal
from emp
where job = 'CLERK'

select * from V


ENAME JOB SAL
---------- --------- ----------
SMITH CLERK 800
ADAMS CLERK 1100
JAMES CLERK 950
MILLER CLERK 1300

Only clerks are returned from view V. However, the view does not show all possible EMP columns. You want to return the EMPNO, ENAME, JOB, SAL, and DEPTNO of all employees in EMP that match the rows from view V. You want the result set to be the following:

    EMPNO  ENAME       JOB             SAL     DEPTNO
-------- ---------- --------- ---------- ---------
7369 SMITH CLERK 800 20
7876 ADAMS CLERK 1100 20
7900 JAMES CLERK 950 30
7934 MILLER CLERK 1300 10

Solution

Join the tables on all the columns necessary to return the correct result. Alternatively, use the set operation INTERSECT to avoid performing a join and instead return the intersection (common rows) of the two tables.

MySQL and SQL Server

Join table EMP to view V using multiple join conditions:

 1 select e.empno,e.ename,e.job,e.sal,e.deptno
2 from emp e, V
3 where e.ename = v.ename
4 and e.job = v.job
5 and e.sal = v.sal

Alternatively, you can perform the same join via the JOIN clause:

 1 select e.empno,e.ename,e.job,e.sal,e.deptno
2 from emp e join V
3 on ( e.ename = v.ename
4 and e.job = v.job
5 and e.sal = v.sal )

DB2, Oracle, and PostgreSQL

The MySQL and SQL Server solution also works for DB2, Oracle, and PostgreSQL. It's the solution you should use if you need to return values from view V.

If you do not actually need to return columns from view V, you may use the set operation INTERSECT along with an IN predicate:

 1 select empno,ename,job,sal,deptno
2 from emp
3 where (ename,job,sal) in (
4 select ename,job,sal from emp
5 intersect
6 select ename,job,sal from V
7 )

Discussion

When performing joins, you must consider the proper columns to join on in order to return correct results. This is especially important when rows can have common values for some columns while having different values for others.

The set operation INTERSECT will return rows common to both row sources. When using INTERSECT, you are required to compare the same number of items, having the same data type, from two tables. When working with set operations keep in mind that, by default, duplicate rows will not be returned.


** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Combining Related Rows - SQL Server

Problem

You want to return rows from multiple tables by joining on a known common column or joining on columns that share common values. For example, you want to display the names of all employees in department 10 along with the location of each employee's department, but that data is stored in two separate tables. You want the result set to be the following:

 ENAME       LOC
---------- ----------
CLARK NEW YORK
KING NEW YORK
MILLER NEW YORK

Solution

Join table EMP to table DEPT on DEPTNO:

 1 select e.ename, d.loc
2 from emp e, dept d
3 where e.deptno = d.deptno
4 and e.deptno = 10

Discussion

The solution is an example of a join, or more accurately an equi-join, which is a type of inner join. A join is an operation that combines rows from two tables into one. An equi-join is one in which the join condition is based on an equality condition (e.g., where one department number equals another). An inner join is the original type of join; each row returned contains data from each table.

Conceptually, the result set from a join is produced by first creating a Cartesian product (all possible combinations of rows) from the tables listed in the FROM clause, as seen below:

 
select e.ename, d.loc,
e.deptno as emp_deptno,
d.deptno as dept_deptno
from emp e, dept d
where e.deptno = 10


ENAME LOC EMP_DEPTNO DEPT_DEPTNO
---------- ------------- ---------- -----------
CLARK NEW YORK 10 10
KING NEW YORK 10 10
MILLER NEW YORK 10 10
CLARK DALLAS 10 20

KING DALLAS 10 20
MILLER DALLAS 10 20
CLARK CHICAGO 10 30
KING CHICAGO 10 30
MILLER CHICAGO 10 30
CLARK BOSTON 10 40
KING BOSTON 10 40
MILLER BOSTON 10 40

Every employee in table EMP (in department 10) is returned along with every department in the table DEPT. Then, the expression in the WHERE clause involving e.deptno and d.deptno (the join) restricts the result set such that the only rows returned are the ones where EMP.DEPTNO and DEPT.DEPTNO are equal:

 
select e.ename, d.loc,
e.deptno as emp_deptno,
d.deptno as dept_deptno
from emp e, dept d
where e.deptno = d.deptno
and e.deptno = 10


ENAME LOC EMP_DEPTNO DEPT_DEPTNO
---------- -------------- ---------- -----------
CLARK NEW YORK 10 10
KING NEW YORK 10 10
MILLER NEW YORK 10 10

An alternative solution makes use of an explicit JOIN clause (the "INNER" keyword is optional):

 select e.ename, d.loc
from emp e inner join dept d
on (e.deptno = d.deptno)

where e.deptno = 10

Use the JOIN clause if you prefer to have the join logic in the FROM clause rather than the WHERE clause. Both styles are ANSI compliant and work on all the latest versions of the RDBMSs in this book.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Stacking One Rowset atop Another - SQL Server

Problem

You want to return data stored in more than one table, conceptually stacking one result set atop the other. The tables do not necessarily have a common key, but their columns do have the same data types. For example, you want to display the name and department number of the employees in department 10 in table EMP, along with the name and department number of each department in table DEPT. You want the result set to look like the following:

 ENAME_AND_DNAME      DEPTNO
--------------- ----------
CLARK 10
KING 10
MILLER 10
----------
ACCOUNTING 10
RESEARCH 20
SALES 30
OPERATIONS 40

Solution

Use the set operation UNION ALL to combine rows from multiple tables:

 1  select ename as ename_and_dname, deptno
2 from emp
3 where deptno = 10
4 union all
5 select '----------', null
6 from t1
7 union all
8 select dname, deptno
9 from dept

Discussion

UNION ALL combines rows from multiple row sources into one result set. As with all set operations, the items in all the SELECT lists must match in number and data type. For example, both of the following queries will fail:

 select deptno   |  select deptno, dname
from dept | from dept
union all | union
select ename | select deptno
from emp | from emp

It is important to note, UNION ALL will include duplicates if they exist. If you wish to filter out duplicates, use the UNION operator. For example, a UNION between EMP.DEPTNO and DEPT.DEPTNO returns only four rows:

 
select deptno
from emp
union
select deptno
from dept


DEPTNO
---------
10
20
30
40

Specifying UNION rather than UNION ALL will most likely result in a sort operation in order to eliminate duplicates. Keep this in mind when working with large result sets. Using UNION is roughly equivalent to the following query, which applies DISTINCT to the output from a UNION ALL:

 
select distinct deptno
from (
select deptno
from emp
union all
select deptno
from dept
)


DEPTNO
---------
10
20
30
40

You wouldn't use DISTINCT in a query unless you had to, and the same rule applies for UNION; don't use it instead of UNION ALL unless you have to.


** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Sorting on a Data Dependent Key

Problem

You want to sort based on some conditional logic. For example: if JOB is "SALESMAN" you want to sort on COMM; otherwise, you want to sort by SAL. You want to return the following result set:

 ENAME             SAL JOB             COMM
---------- ---------- --------- ----------
TURNER 1500 SALESMAN 0
ALLEN 1600 SALESMAN 300
WARD 1250 SALESMAN 500
SMITH 800 CLERK
JAMES 950 CLERK
ADAMS 1100 CLERK
MARTIN 1250 SALESMAN 1300
MILLER 1300 CLERK
CLARK 2450 MANAGER
BLAKE 2850 MANAGER
JONES 2975 MANAGER
SCOTT 3000 ANALYST
FORD 3000 ANALYST
KING 5000 PRESIDENT

Solution

Use a CASE expression in the ORDER BY clause:

 1 select ename,sal,job,comm
2 from emp
3 order by case when job = 'SALESMAN' then comm else sal end

Discussion

You can use the CASE expression to dynamically change how results are sorted. The values passed to the ORDER BY look as follows:

 
select ename,sal,job,comm,
case when job = 'SALESMAN' then comm else sal end as ordered
from emp
order by 5


ENAME SAL JOB COMM ORDERED
---------- ---------- --------- ---------- ----------
TURNER 1500 SALESMAN 0 0
ALLEN 1600 SALESMAN 300 300
WARD1 250 SALESMAN 500 500
SMITH 800 CLERK 800
JAMES 950 CLERK 950
ADAMS 1100 CLERK 1100
MARTIN 1250 SALESMAN 1300 1300
MILLER 1300 CLERK 1300
CLARK2 450 MANAGER 2450
BLAKE2 850 MANAGER 2850
JONES2 975 MANAGER 2975
SCOTT 3000 ANALYST 3000
FORD 3000 ANALYST 3000
KING 5000 PRESIDENT 5000
** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Dealing with Nulls when Sorting - SQL Server

Problem

You want to sort results from EMP by COMM, but the field is nullable. You need a way to specify whether nulls sort last:

 ENAME              SAL        COMM
---------- ---------- ----------
TURNER 1500 0
ALLEN 1600 300
WARD 1250 500
MARTIN 1250 1400
SMITH 800
JONES 2975
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
BLAKE 2850
CLARK 2450
SCOTT 3000
KING 5000

or whether they sort first:

 ENAME              SAL        COMM
---------- ---------- ----------
SMITH 800
JONES 2975
CLARK 2450
BLAKE 2850
SCOTT 3000
KING 5000
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
MARTIN 1250 1400
WARD 1250 500
ALLEN 1600 300
TURNER 1500 0

Solution

Depending on how you want the data to look (and how your particular RDBMS sorts NULL values), you can sort the nullable column in ascending or descending order:

 1 select ename,sal,comm
2 from emp
3 order by 3

1 select ename,sal,comm
2 from emp
3 order by 3 desc

This solution puts you in a position such that if the nullable column contains non-NULL values, they will be sorted in ascending or descending order as well, according to what you ask for; this may or may not what you have in mind. If instead you would like to sort NULL values differently than non-NULL values, for example, you want to sort non-NULL values in ascending or descending order and all NULL values last, you can use a CASE expression to conditionally sort the column.

DB2, MySQL, PostgreSQL, and SQL Server

Use a CASE expression to "flag" when a value is NULL. The idea is to have a flag with two values: one to represent NULLs, the other to represent non-NULLs. Once you have that, simply add this flag column to the ORDER BY clause. You'll easily be able to control whether NULL values are sorted first or last without interfering with non-NULL values:

 /* NON-NULL COMM SORTED ASCENDING, ALL NULLS LAST */


1 select ename,sal,comm
2 from (
3 select ename,sal,comm,
4 case when comm is null then 0 else 1 end as is_null
5 from emp
6 ) x
7 order by is_null desc,comm


ENAME SAL COMM
------ ----- ----------
TURNER 1500 0
ALLEN 1600 300
WARD 1250 500
MARTIN 1250 1400
SMITH 800
JONES 2975
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
BLAKE 2850
CLARK 2450
SCOTT 3000
KING 5000

/* NON-NULL COMM SORTED DESCENDING, ALL NULLS LAST */


1 select ename,sal,comm
2 from (
3 select ename,sal,comm,
4 case when comm is null then 0 else 1 end as is_null
5 from emp
6 ) x
7 order by is_null desc,comm desc


ENAME SAL COMM
------ ----- ----------
MARTIN 1250 1400
WARD 1250 500
ALLEN 1600 300
TURNER 1500 0
SMITH 800
JONES 2975
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
BLAKE 2850
CLARK 2450
SCOTT 3000
KING 5000

/* NON-NULL COMM SORTED ASCENDING, ALL NULLS FIRST */


1 select ename,sal,comm
2 from (
3 select ename,sal,comm,
4 case when comm is null then 0 else 1 end as is_null
5 from emp
6 ) x
7 order by is_null,comm


ENAME SAL COMM
------ ----- ----------
SMITH 800
JONES 2975
CLARK 2450
BLAKE 2850
SCOTT 3000
KING 5000
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
TURNER 1500 0
ALLEN 1600 300
WARD 1250 500
MARTIN 1250 1400

/* NON-NULL COMM SORTED DESCENDING, ALL NULLS FIRST */


1 select ename,sal,comm
2 from (
3 select ename,sal,comm,
4 case when comm is null then 0 else 1 end as is_null
5 from emp
6 ) x
7 order by is_null,comm desc


ENAME SAL COMM
------ ----- ----------
SMITH 800
JONES 2975
CLARK 2450
BLAKE 2850
SCOTT 3000
KING 5000
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
MARTIN 1250 1400
WARD 1250 500
ALLEN 1600 300
TURNER 1500 0

Oracle

Users on Oracle8i Database and earlier can use the solution for the other platforms. Users on Oracle9i Database and later can use the NULLS FIRST and NULLS LAST extension to the ORDER BYclause to ensure NULLs are sorted first or last regardless of how non-NULL values are sorted:

 /* NON-NULL COMM SORTED ASCENDING, ALL NULLS LAST */


1 select ename,sal,comm
2 from emp
3 order by comm nulls last


ENAME SAL COMM
------ ----- ---------
TURNER 1500 0
ALLEN 1600 300
WARD 1250 500
MARTIN 1250 1400
SMITH 800
JONES 2975
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
BLAKE 2850
CLARK 2450
SCOTT 3000
KING 5000

/* NON-NULL COMM SORTED ASCENDING, ALL NULLS FIRST */


1 select ename,sal,comm
2 from emp
3 order by comm nulls first


ENAME SAL COMM
------ ----- ----------
SMITH 800
JONES 2975
CLARK 2450
BLAKE 2850
SCOTT 3000
KING 5000
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
TURNER 1500 0
ALLEN 1600 300
WARD 1250 500
MARTIN 1250 1400

/* NON-NULL COMM SORTED DESCENDING, ALL NULLS FIRST */


1 select ename,sal,comm
2 from emp
3 order by comm desc nulls first


ENAME SAL COMM
------ ----- ----------
SMITH 800
JONES 2975
CLARK 2450
BLAKE 2850
SCOTT 3000
KING 5000
JAMES 950
MILLER 1300
FORD 3000
ADAMS 1100
MARTIN 1250 1400
WARD 1250 500
ALLEN 1600 300
TURNER 1500 0

Discussion

Unless your RDBMS provides you with a way to easily sort NULL values first or last without modifying non-NULL values in the same column (such as Oracle does), you'll need an auxiliary column.

The purpose of this extra column (in the query only, not in the table) is to allow you to identify NULL values and sort them altogether, first or last. The following query returns the result set for inline view X for the non-Oracle solution:

 
select ename,sal,comm,
case when comm is null then 0 else 1 end as is_null
from emp


ENAME SAL COMM IS_NULL
------ ----- ---------- ----------
SMITH 800 0
ALLEN 1600 300 1
WARD 1250 500 1

JONES 2975 0
MARTIN 1250 1400 1
BLAKE 2850 0
CLARK 2450 0
SCOTT 3000 0
KING 5000 0
TURNER 1500 0 1
ADAMS 1100 0
JAMES 950 0
FORD 3000 0
MILLER 1300 0

By using the values returned by IS_NULL, you can easily sort NULLS first or last without interfering with the sorting of COMM.


** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

SQL Server - Sorting Mixed Alphanumeric Data

Problem

You have mixed alphanumeric data and want to sort by either the numeric or character portion of the data. Consider this view:

 
create view V
as
select ename||' '||deptno as data
from emp

select * from V


DATA
-------------
SMITH 20
ALLEN 30
WARD 30
JONES 20
MARTIN 30
BLAKE 30
CLARK 10
SCOTT 20
KING 10
TURNER 30
ADAMS 20
JAMES 30
FORD 20
MILLER 10

You want to sort the results by DEPTNO or ENAME. Sorting by DEPTNO produces the following result set:

 DATA
----------
CLARK 10
KING 10
MILLER 10
SMITH 20
ADAMS 20
FORD 20
SCOTT 20
JONES 20
ALLEN 30
BLAKE 30
MARTIN 30
JAMES 30
TURNER 30
WARD 30

Sorting by ENAME produces the following result set:

 DATA
---------
ADAMS 20
ALLEN 30
BLAKE 30
CLARK 10
FORD 20
JAMES 30
JONES 20
KING 10
MARTIN 30
MILLER 10
SCOTT 20
SMITH 20
TURNER 30
WARD 30

Solution

Oracle and PostgreSQL

Use the functions REPLACE and TRANSLATE to modify the string for sorting:

 /* ORDER BY DEPTNO */

1 select data
2 from V
3 order by replace(data,
4 replace(
5 translate(data,'0123456789','##########'),'#',''),'')

/* ORDER BY ENAME */

1 select data
2 from emp
3 order by replace(
4 translate(data,'0123456789','##########'),'#','')
DB2

Implicit type conversion is more strict in DB2 than in Oracle or PostgreSQL, so you will need to cast DEPTNO to a CHAR for view V to be valid. Rather than recreate view V, this solution will simply use an inline view. The solution uses REPLACE and TRANSLATE in the same way as the Oracle and PostrgreSQL solution, but the order of arguments for TRANSLATE is slightly different for DB2:

 /* ORDER BY DEPTNO */

1 select *
2 from (
3 select ename||' '||cast(deptno as char(2)) as data
4 from emp
5 ) v
6 order by replace(data,
7 replace(
8 translate(data,'##########','0123456789'),'#',''),'')

/* ORDER BY ENAME */

1 select *
2 from (
3 select ename||' '||cast(deptno as char(2)) as data
4 from emp
5 ) v
6 order by replace(
7 translate(data,'##########','0123456789'),'#','')
MySQL and SQL Server

The TRANSLATE function is not currently supported by these platforms, thus a solution for this problem will not be provided.

Discussion

The TRANSLATE and REPLACE functions remove either the numbers or characters from each row, allowing you to easily sort by one or the other. The values passed to ORDER BY are shown in the following query results (using the Oracle solution as the example, as the same technique applies to all three vendors; only the order of parameters passed to TRANSLATE is what sets DB2 apart):

 
select data,
replace(data,
replace(
translate(data,'0123456789','##########'),'#',''),'') nums,
replace(
translate(data,'0123456789','##########'),'#','') chars
from V


DATA NUMS CHARS
------------ ------ ----------
SMITH 20 20 SMITH
ALLEN 30 30 ALLEN
WARD 30 30 WARD
JONES 20 20 JONES
MARTIN 30 30 MARTIN
BLAKE 30 30 BLAKE
CLARK 10 10 CLARK
SCOTT 20 20 SCOTT
KING 10 10 KING
TURNER 30 30 TURNER
ADAMS 20 20 ADAMS
JAMES 30 30 JAMES
FORD 20 20 FORD
MILLER 10 10 MILLER
** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Sorting by Substrings - SQL SERVER

Problem

You want to sort the results of a query by specific parts of a string. For example, you want to return employee names and jobs from table EMP and sort by the last two characters in the job field. The result set should look like the following:

 ENAME       JOB
---------- ---------
KING PRESIDENT
SMITH CLERK
ADAMS CLERK
JAMES CLERK
MILLER CLERK
JONES MANAGER
CLARK MANAGER
BLAKE MANAGER
ALLEN SALESMAN
MARTIN SALESMAN
WARD SALESMAN
TURNER SALESMAN
SCOTT ANALYST
FORD ANALYST

Solution

DB2, MySQL, Oracle, and PostgreSQL

Use the SUBSTR function in the ORDER BY clause:

 select ename,job
from emp
order by substr(job,length(job)-2)

SQL Server

Use the SUBSTRING function in the ORDER BY clause:

 select ename,job
from emp
order by substring(job,len(job)-2,2)

Discussion

Using your DBMS's substring function, you can easily sort by any part of a string. To sort by the last two characters of a string, find the end of the string (which is the length of the string) and subtract 2. The start position will be the second to last character in the string. You then take all characters after that start position. Because SQL Server requires a third parameter in SUBSTRING to specify the number of characters to take. In this example, any number greater than or equal to 2 will work.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Sorting by Multiple Fields

Problem

You want to sort the rows from EMP first by DEPTNO ascending, then by salary descending. You want to return the following result set:

      EMPNO      DEPTNO         SAL  ENAME       JOB
---------- ---------- ---------- ---------- ---------
7839 10 5000 KING PRESIDENT
7782 10 2450 CLARK MANAGER
7934 10 1300 MILLER CLERK
7788 20 3000 SCOTT ANALYST
7902 20 3000 FORD ANALYST
7566 20 2975 JONES MANAGER
7876 20 1100 ADAMS CLERK
7369 20 800 SMITH CLERK
7698 30 2850 BLAKE MANAGER
7499 30 1600 ALLEN SALESMAN
7844 30 1500 TURNER SALESMAN
7521 30 1250 WARD SALESMAN
7654 30 1250 MARTIN SALESMAN
7900 30 950 JAMES CLERK

Solution

List the different sort columns in the ORDER BY clause, separated by commas:

 1 select empno,deptno,sal,ename,job
2 from emp
3 order by deptno, sal desc

Discussion

The order of precedence in ORDER BY is from left to right. If you are ordering using the numeric position of a column in the SELECT list, then that number must not be greater than the number of items in the SELECT list. You are generally permitted to order by a column not in the SELECT list, but to do so you must explicitly name the column. However, if you are using GROUP BY or DISTINCT in your query, you cannot order by columns that are not in the SELECT list.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Returning Query Results in a Specified Order

Problem

You want to display the names, job, and salaries of employees in department 10 in order based on their salary (from lowest to highest). You want to return the following result set:

 ENAME       JOB               SAL
---------- --------- ----------
MILLER CLERK 1300
CLARK MANAGER 2450
KING PRESIDENT 5000

Solution

Use the ORDER BY clause:

 1 select ename,job,sal
2 from emp
3 where deptno = 10
4 order by sal asc

Discussion

The ORDER BY clause allows you to order the rows of your result set. The solution sorts the rows based on SAL in ascending order. By default, ORDER BY will sort in ascending order, and the ASC clause is therefore optional. Alternatively, specify DESC to sort in descending order:

  select ename,job,sal
from emp
where deptno = 10
order by sal desc


ENAME JOB SAL
---------- --------- ----------
KING PRESIDENT 5000
CLARK MANAGER 2450
MILLER CLERK 1300
You need not specify the name of the column on which to sort. You can instead specify a number representing the column. The number starts at 1 and matches the items in the SELECT list from left to right. For example:
 
select ename,job,sal
from emp
where deptno = 10
order by 3 desc


ENAME JOB SAL
---------- --------- ----------
KING PRESIDENT 5000
CLARK MANAGER 2450
MILLER CLERK 1300

The number 3 in this example's ORDER BY clause corresponds to the third column in the SELECT list, which is SAL.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table