Determining Whether Two Tables Have the Same Data - SQL

Problem

You want to know if two tables or views have the same data (cardinality and values). Consider the following view:

 
create view V
as
select * from emp where deptno != 10
union all
select * from emp where ename = 'WARD'

select * from V


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
----- ---------- --------- ----- ----------- ----- ----- ------
7369 SMITH CLERK 7902 17-DEC-1980 800 20
7499 ALLEN SALESMAN 7698 20-FEB-1981 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30
7566 JONES MANAGER 7839 02-APR-1981 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1300 30
7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30
7788 SCOTT ANALYST 7566 09-DEC-1982 3000 20
7844 TURNER SALESMAN 7698 08-SEP-1981 1500 0 30
7876 ADAMS CLERK 7788 12-JAN-1983 1100 20
7900 JAMES CLERK 7698 03-DEC-1981 950 30
7902 FORD ANALYST 7566 03-DEC-1981 3000 20
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30

You want to determine whether or not this view has exactly the same data as table EMP. The row for employee "WARD" is duplicated to show that the solution will reveal not only different data but duplicates as well. Based on the rows in table EMP the difference will be the three rows for employees in department 10 and the two rows for employee "WARD". You want to return the following result set:

 EMPNO ENAME       JOB         MGR  HIREDATE      SAL   COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2
7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1
7839 KING PRESIDENT 17-NOV-1981 5000 10 1
7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1

Solution

Functions that perform SET difference (MINUS or EXCEPT, depending on your DBMS) make the problem of comparing tables a relatively easy one to solve. If your DBMS does not offer such functions, you can use a correlated subquery.

DB2 and PostgreSQL

Use the set operations EXCEPT and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V:

  1  (
2 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
3 count(*) as cnt
4 from V
5 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
6 except
7 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
8 count(*) as cnt
9 from emp
10 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
11 )
12 union all
13 (
14 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
15 count(*) as cnt
16 from emp
17 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
18 except
19 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
20 count(*) as cnt
21 from v
22 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
23 )

Oracle

Use the set operations MINUS and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V:

  1  (
2 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
3 count(*) as cnt
4 from V
5 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
6 minus
7 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
8 count(*) as cnt
9 from emp
10 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
11 )
12 union all
13 (
14 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
15 count(*) as cnt
16 from emp
17 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
18 minus
19 select empno,ename,job,mgr,hiredate,sal,comm,deptno,
20 count(*) as cnt
21 from v
22 group by empno,ename,job,mgr,hiredate,sal,comm,deptno
23 )

MySQL and SQL Server

Use a correlated subquery and UNION ALL to find the rows in view V and not in table EMP combined with the rows in table EMP and not in view V:

  1  select *
2 from (
3 select e.empno,e.ename,e.job,e.mgr,e.hiredate,
4 e.sal,e.comm,e.deptno, count(*) as cnt
5 from emp e
6 group by empno,ename,job,mgr,hiredate,
7 sal,comm,deptno
8 ) e
9 where not exists (
10 select null
11 from (
12 select v.empno,v.ename,v.job,v.mgr,v.hiredate,
13 v.sal,v.comm,v.deptno, count(*) as cnt
14 from v
15 group by empno,ename,job,mgr,hiredate,
16 sal,comm,deptno
17 ) v
18 where v.empno = e.empno
19 and v.ename = e.ename
20 and v.job = e.job
21 and v.mgr = e.mgr
22 and v.hiredate = e.hiredate
23 and v.sal = e.sal
24 and v.deptno = e.deptno
25 and v.cnt = e.cnt
26 and coalesce(v.comm,0) = coalesce(e.comm,0)
27 )
28 union all
29 select *
30 from (
31 select v.empno,v.ename,v.job,v.mgr,v.hiredate,
32 v.sal,v.comm,v.deptno, count(*) as cnt
33 from v
34 group by empno,ename,job,mgr,hiredate,
35 sal,comm,deptno
36 ) v
37 where not exists (
38 select null
39 from (
40 select e.empno,e.ename,e.job,e.mgr,e.hiredate,
41 e.sal,e.comm,e.deptno, count(*) as cnt
42 from emp e
43 group by empno,ename,job,mgr,hiredate,
44 sal,comm,deptno
45 ) e
46 where v.empno = e.empno
47 and v.ename = e.ename
48 and v.job = e.job
49 and v.mgr = e.mgr
50 and v.hiredate = e.hiredate
51 and v.sal = e.sal
52 and v.deptno = e.deptno
53 and v.cnt = e.cnt
54 and coalesce(v.comm,0) = coalesce(e.comm,0)
55 )

Discussion

Despite using different techniques, the concept is the same for all solutions:

  1. First, find rows in table EMP that do not exist in view V.

  2. Then combine (UNION ALL) those rows with rows from view V that do not exist in table EMP.

If the tables in question are equal, then no rows are returned. If the tables are different, the rows causing the difference are returned. As an easy first step when comparing tables, you can compare the cardinalities alone rather than including them with the data comparison. The following query is a simple example of this and will work on all DBMSs:

 
select count(*)
from emp
union
select count(*)
from dept


COUNT(*)
--------
4
14

Because UNION will filter out duplicates, only one row will be returned if the tables' cardinalities are the same. Because two rows are returned in this example, you know that the tables do not contain identical rowsets.

DB2, Oracle, and PostgreSQL

MINUS and EXCEPT work in the same way, so I will use EXCEPT for this discussion. The queries before and after the UNION ALL are very similar. So, to understand how the solution works, simply execute the query prior to the UNION ALL by itself. The following result set is produced by executing lines 111 in the solution section:

 
(
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from V
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
except
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from emp
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2

The result set represents a row found in view V that is either not in table EMP or has a different cardinality than that same row in table EMP. In this case, the duplicate row for employee "WARD" is found and returned. If you're still having trouble understanding how the result set is produced, run each query on either side of EXCEPT individually. You'll notice the only difference between the two result sets is the CNT for employee "WARD" returned by view V.

The portion of the query after the UNION ALL does the opposite of the query preceding UNION ALL. The query returns rows in table EMP not in view V:

 
(
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from emp
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
minus
select empno,ename,job,mgr,hiredate,sal,comm,deptno,
count(*) as cnt
from v
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1
7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1
7839 KING PRESIDENT 17-NOV-1981 5000 10 1
7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1

The results are then combined by UNION ALL to produce the final result set.

MySQL and SQL Server

The queries before and after the UNION ALL are very similar. To understand how the subquery-based solution works, simply execute the query prior to the UNION ALL by itself. The query below is from lines 127 in the solution:

 
select *
from (
select e.empno,e.ename,e.job,e.mgr,e.hiredate,
e.sal,e.comm,e.deptno, count(*) as cnt
from emp e
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) e
where not exists (
select null
from (
select v.empno,v.ename,v.job,v.mgr,v.hiredate,
v.sal,v.comm,v.deptno, count(*) as cnt
from v
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) v
where v.empno = e.empno
and v.ename = e.ename
and v.job = e.job
and v.mgr = e.mgr
and v.hiredate = e.hiredate
and v.sal = e.sal
and v.deptno = e.deptno
and v.cnt = e.cnt
and coalesce(v.comm,0) = coalesce(e.comm,0)
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1
7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1
7839 KING PRESIDENT 17-NOV-1981 5000 10 1
7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1

Notice that the comparison is not between table EMP and view V, but rather between inline view E and inline view V. The cardinality for each row is found and returned as an attribute for that row. You are comparing each row and its occurrence count. If you are having trouble understanding how the comparison works, run the subqueries independently. The next step is to find all rows (including CNT) in inline view E that do not exist in inline view V. The comparison uses a correlated subquery and NOT EXISTS. The joins will determine which rows are the same, and the result will be all rows from inline view E that are not the rows returned by the join. The query after the UNION ALL does the opposite; it finds all rows in inline view V that do not exist in inline view E:

 
select *
from (
select v.empno,v.ename,v.job,v.mgr,v.hiredate,
v.sal,v.comm,v.deptno, count(*) as cnt
from v
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) v
where not exists (
select null
from (
select e.empno,e.ename,e.job,e.mgr,e.hiredate,
e.sal,e.comm,e.deptno, count(*) as cnt
from emp e
group by empno,ename,job,mgr,hiredate,
sal,comm,deptno
) e
where v.empno = e.empno
and v.ename = e.ename
and v.job = e.job
and v.mgr = e.mgr
and v.hiredate = e.hiredate
and v.sal = e.sal
and v.deptno = e.deptno
and v.cnt = e.cnt
and coalesce(v.comm,0) = coalesce(e.comm,0)
)


EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
----- ---------- --------- ----- ----------- ----- ----- ------ ---
7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2

The results are then combined by UNION ALL to produce the final result set.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

0 Comments: