SQL - Performing Joins when Using Aggregates

Problem

You want to perform an aggregation but your query involves multiple tables. You want to ensure that joins do not disrupt the aggregation. For example, you want to find the sum of the salaries for employees in department 10 along with the sum of their bonuses. Some employees have more than one bonus and the join between table EMP and table EMP_BONUS is causing incorrect values to be returned by the aggregate function SUM. For this problem, table EMP_BONUS contains the following data:

 
select * from emp_bonus


EMPNO RECEIVED TYPE
----- ----------- ----------
7934 17-MAR-2005 1
7934 15-FEB-2005 2
7839 15-FEB-2005 3
7782 15-FEB-2005 1

Now, consider the following query that returns the salary and bonus for all employees in department 10. Table BONUS.TYPE determines the amount of the bonus. A type 1 bonus is 10% of an employee's salary, type 2 is 20%, and type 3 is 30%.

 
select e.empno,
e.ename,
e.sal,
e.deptno,
e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3
end as bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10


EMPNO ENAME SAL DEPTNO BONUS
------- ---------- ---------- ---------- ---------
7934 MILLER 1300 10 130
7934 MILLER 1300 10 260
7839 KING 5000 10 1500
7782 CLARK 2450 10 245

So far, so good. However, things go awry when you attempt a join to the EMP_ BONUS table in order to sum the bonus amounts:

 
select deptno,
sum(sal) as total_sal,
sum(bonus) as total_bonus
from (
select e.empno,
e.ename,
e.sal,
e.deptno,
e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3
end as bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10
) x
group by deptno


DEPTNO TOTAL_SAL TOTAL_BONUS
------ ----------- -----------
10 10050 2135

While the TOTAL_BONUS is correct, the TOTAL_SAL is incorrect. The sum of all salaries in department 10 is 8750, as the following query shows:

 
select sum(sal) from emp where deptno=10


SUM(SAL)
----------
8750

Why is TOTAL_SAL incorrect? The reason is the duplicate rows in the SAL column created by the join. Consider the following query, which joins table EMP and EMP_ BONUS:

 
select e.ename,
e.sal
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10


ENAME SAL
---------- ----------
CLARK 2450
KING 5000
MILLER 1300
MILLER 1300

Now it is easy to see why the value for TOTAL_SAL is incorrect: MILLER's salary is counted twice. The final result set that you are really after is:

 DEPTNO TOTAL_SAL TOTAL_BONUS
------ --------- -----------
10 8750 2135

Solution

You have to be careful when computing aggregates across joins. Typically when duplicates are returned due to a join, you can avoid miscalculations by aggregate functions in two ways: you can simply use the keyword DISTINCT in the call to the aggregate function, so only unique instances of each value are used in the computation; or you can perform the aggregation first (in an inline view) prior to joining, thus avoiding the incorrect computation by the aggregate function because the aggregate will already be computed before you even join, thus avoiding the problem altogether. The solutions that follow use DISTINCT. The "Discussion" section will discuss the technique of using an inline view to perform the aggregation prior to joining.

MySQL and PostgreSQL

Perform a sum of only the DISTINCT salaries:

  1 select deptno,
2 sum(distinct sal) as total_sal,
3 sum(bonus) as total_bonus
4 from (
5 select e.empno,
6 e.ename,
7 e.sal,
8 e.deptno,
9 e.sal*case when eb.type = 1 then .1
10 when eb.type = 2 then .2
11 else .3
12 end as bonus
13 from emp e, emp_bonus eb
14 where e.empno = eb.empno
15 and e.deptno = 10
16 ) x
17 group by deptno

DB2, Oracle, and SQL Server

These platforms support the preceding solution, but they also support an alternative solution using the window function SUM OVER:

  1 select distinct deptno,total_sal,total_bonus
2 from (
3 select e.empno,
4 e.ename,
5 sum(distinct e.sal) over
6 (partition by e.deptno) as total_sal,
7 e.deptno,
8 sum(e.sal*case when eb.type = 1 then .1
9 when eb.type = 2 then .2
10 else .3 end) over
11 (partition by deptno) as total_bonus
12 from emp e, emp_bonus eb
13 where e.empno = eb.empno
14 and e.deptno = 10
15 ) x

Discussion

MySQL and PostgreSQL

The second query in the "Problem" section of this recipe joins table EMP and table EMP_BONUS and returns two rows for employee "MILLER", which is what causes the error on the sum of EMP.SAL (the salary is added twice). The solution is to simply sum the distinct EMP.SAL values that are returned by the query. The following query is an alternative solution. The sum of all salaries in department 10 is computed first and that row is then joined to table EMP, which is then joined to table EMP_BONUS. The following query works for all DBMSs:

 
select d.deptno,
d.total_sal,
sum(e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3 end) as total_bonus
from emp e,
emp_bonus eb,
(
select deptno, sum(sal) as total_sal
from emp
where deptno = 10
group by deptno
) d
where e.deptno = d.deptno
and e.empno = eb.empno
group by d.deptno,d.total_sal


DEPTNO TOTAL_SAL TOTAL_BONUS
--------- ---------- ------------
10 8750 2135

DB2, Oracle, and SQL Server

This alternative solution takes advantage of the window function SUM OVER. The following query is taken from lines 314 in "Solution" and returns the following result set:

 
select e.empno,
e.ename,
sum(distinct e.sal) over
(partition by e.deptno) as total_sal,
e.deptno,
sum(e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3 end) over
(partition by deptno) as total_bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10


EMPNO ENAME TOTAL_SAL DEPTNO TOTAL_BONUS
----- ---------- ---------- ------ -----------
7934 MILLER 8750 10 2135
7934 MILLER 8750 10 2135
7782 CLARK 8750 10 2135
7839 KING 8750 10 2135

The windowing function, SUM OVER, is called twice, first to compute the sum of the distinct salaries for the defined partition or group. In this case, the partition is DEPTNO 10 and the sum of the distinct salaries for DEPTNO 10 is 8750. The next call to SUM OVER computes the sum of the bonuses for the same defined partition. The final result set is produced by taking the distinct values for TOTAL_SAL, DEPTNO, and TOTAL_BONUS.


** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

0 Comments: