Performing Outer Joins when Using Aggregates - SQL Server

Problem

Begin with the same problem as in 3.9, but modify table EMP_BONUS such that the difference in this case is not all employees in department 10 have been given bonuses. Consider the EMP_BONUS table and a query to (ostensibly) find both the sum of all salaries for department 10 and the sum of all bonuses for all employees in department 10:

 
select * from emp_bonus


EMPNO RECEIVED TYPE
---------- ----------- ----------
7934 17-MAR-2005 1
7934 15-FEB-2005 2


select deptno,
sum(sal) as total_sal,
sum(bonus) as total_bonus
from (
select e.empno,
e.ename,
e.sal,
e.deptno,
e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3 end as bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10
)
group by deptno


DEPTNO TOTAL_SAL TOTAL_BONUS
------ ---------- -----------
10 2600 390

The result for TOTAL_BONUS is correct, but the value returned for TOTAL_SAL does not represent the sum of all salaries in department 10. The following query shows why the TOTAL_SAL is incorrect:

 
select e.empno,
e.ename,
e.sal,
e.deptno,
e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3 end as bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10


EMPNO ENAME SAL DEPTNO BONUS
--------- --------- ------- ---------- ----------
7934 MILLER 1300 10 130
7934 MILLER 1300 10 260

Rather than sum all salaries in department 10, only the salary for "MILLER" is summed and it is erroneously summed twice. Ultimately, you would like to return the following result set:

 DEPTNO TOTAL_SAL TOTAL_BONUS
------ --------- -----------
10 8750 390

Solution

The solution is similar to that of 3.9, but here you outer join to EMP_BONUS to ensure all employees from department 10 are included.

DB2, MySQL, PostgreSQL, SQL Server

Outer join to EMP_BONUS, then perform the sum on only distinct salaries from department 10:

  1 select deptno,
2 sum(distinct sal) as total_sal,
3 sum(bonus) as total_bonus
4 from (
5 select e.empno,
6 e.ename,
7 e.sal,
8 e.deptno,
9 e.sal*case when eb.type is null then 0
10 when eb.type = 1 then .1
11 when eb.type = 2 then .2
12 else .3 end as bonus
13 from emp e left outer join emp_bonus eb
14 on (e.empno = eb.empno)
15 where e.deptno = 10
16 )
17 group by deptno

You can also use the window function SUM OVER:

  1 select distinct deptno,total_sal,total_bonus
2 from (
3 select e.empno,
4 e.ename,
5 sum(distinct e.sal) over
6 (partition by e.deptno) as total_sal,
7 e.deptno,
8 sum(e.sal*case when eb.type is null then 0
9 when eb.type = 1 then .1
10 when eb.type = 2 then .2
11 else .3
12 end) over
13 (partition by deptno) as total_bonus
14 from emp e left outer join emp_bonus eb
15 on (e.empno = eb.empno)
16 where e.deptno = 10
17 ) x

Oracle

If you are using Oracle9i Database or later you can use the preceding solution. Alternatively, you can use the proprietary Oracle outer-join syntax, which is mandatory for users on Oracle8i Database and earlier:

  1 select deptno,
2 sum(distinct sal) as total_sal,
3 sum(bonus) as total_bonus
4 from (
5 select e.empno,
6 e.ename,
7 e.sal,
8 e.deptno,
9 e.sal*case when eb.type is null then 0
10 when eb.type = 1 then .1
11 when eb.type = 2 then .2
12 else .3 end as bonus
13 from emp e, emp_bonus eb
14 where e.empno = eb.empno (+)
15 and e.deptno = 10
16 )
17 group by deptno

Oracle 8i Database users can also use the SUM OVER syntaxshown for DB2 and the other databases, but must modify it to use the proprietary Oracle outer-join syntax shown in the preceding query.

Discussion

The second query in the "Problem" section of this recipe joins table EMP and table EMP_BONUS and returns only rows for employee "MILLER", which is what causes the error on the sum of EMP.SAL (the other employees in DEPTNO 10 do not have bonuses and their salaries are not included in the sum). The solution is to outer join table EMP to table EMP_BONUS so even employees without a bonus will be included in the result. If an employee does not have a bonus, NULL will be returned for EMP_BONUS.TYPE. It is important to keep this in mind as the CASE statement has been modified and is slightly different from solution 3.9. If EMP_BONUS.TYPE is NULL, the CASE expression returns zero, which has no effect on the sum.

The following query is an alternative solution. The sum of all salaries in department 10 is computed first, then joined to table EMP, which is then joined to table EMP_BONUS (thus avoiding the outer join). The following query works for all DBMSs:

 
select d.deptno,
d.total_sal,
sum(e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3 end) as total_bonus
from emp e,
emp_bonus eb,
(
select deptno, sum(sal) as total_sal
from emp
where deptno = 10
group by deptno
) d
where e.deptno = d.deptno
and e.empno = eb.empno
group by d.deptno,d.total_sal


DEPTNO TOTAL_SAL TOTAL_BONUS
--------- ---------- -----------
10 8750 390

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

SQL - Performing Joins when Using Aggregates

Problem

You want to perform an aggregation but your query involves multiple tables. You want to ensure that joins do not disrupt the aggregation. For example, you want to find the sum of the salaries for employees in department 10 along with the sum of their bonuses. Some employees have more than one bonus and the join between table EMP and table EMP_BONUS is causing incorrect values to be returned by the aggregate function SUM. For this problem, table EMP_BONUS contains the following data:

 
select * from emp_bonus


EMPNO RECEIVED TYPE
----- ----------- ----------
7934 17-MAR-2005 1
7934 15-FEB-2005 2
7839 15-FEB-2005 3
7782 15-FEB-2005 1

Now, consider the following query that returns the salary and bonus for all employees in department 10. Table BONUS.TYPE determines the amount of the bonus. A type 1 bonus is 10% of an employee's salary, type 2 is 20%, and type 3 is 30%.

 
select e.empno,
e.ename,
e.sal,
e.deptno,
e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3
end as bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10


EMPNO ENAME SAL DEPTNO BONUS
------- ---------- ---------- ---------- ---------
7934 MILLER 1300 10 130
7934 MILLER 1300 10 260
7839 KING 5000 10 1500
7782 CLARK 2450 10 245

So far, so good. However, things go awry when you attempt a join to the EMP_ BONUS table in order to sum the bonus amounts:

 
select deptno,
sum(sal) as total_sal,
sum(bonus) as total_bonus
from (
select e.empno,
e.ename,
e.sal,
e.deptno,
e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3
end as bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10
) x
group by deptno


DEPTNO TOTAL_SAL TOTAL_BONUS
------ ----------- -----------
10 10050 2135

While the TOTAL_BONUS is correct, the TOTAL_SAL is incorrect. The sum of all salaries in department 10 is 8750, as the following query shows:

 
select sum(sal) from emp where deptno=10


SUM(SAL)
----------
8750

Why is TOTAL_SAL incorrect? The reason is the duplicate rows in the SAL column created by the join. Consider the following query, which joins table EMP and EMP_ BONUS:

 
select e.ename,
e.sal
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10


ENAME SAL
---------- ----------
CLARK 2450
KING 5000
MILLER 1300
MILLER 1300

Now it is easy to see why the value for TOTAL_SAL is incorrect: MILLER's salary is counted twice. The final result set that you are really after is:

 DEPTNO TOTAL_SAL TOTAL_BONUS
------ --------- -----------
10 8750 2135

Solution

You have to be careful when computing aggregates across joins. Typically when duplicates are returned due to a join, you can avoid miscalculations by aggregate functions in two ways: you can simply use the keyword DISTINCT in the call to the aggregate function, so only unique instances of each value are used in the computation; or you can perform the aggregation first (in an inline view) prior to joining, thus avoiding the incorrect computation by the aggregate function because the aggregate will already be computed before you even join, thus avoiding the problem altogether. The solutions that follow use DISTINCT. The "Discussion" section will discuss the technique of using an inline view to perform the aggregation prior to joining.

MySQL and PostgreSQL

Perform a sum of only the DISTINCT salaries:

  1 select deptno,
2 sum(distinct sal) as total_sal,
3 sum(bonus) as total_bonus
4 from (
5 select e.empno,
6 e.ename,
7 e.sal,
8 e.deptno,
9 e.sal*case when eb.type = 1 then .1
10 when eb.type = 2 then .2
11 else .3
12 end as bonus
13 from emp e, emp_bonus eb
14 where e.empno = eb.empno
15 and e.deptno = 10
16 ) x
17 group by deptno

DB2, Oracle, and SQL Server

These platforms support the preceding solution, but they also support an alternative solution using the window function SUM OVER:

  1 select distinct deptno,total_sal,total_bonus
2 from (
3 select e.empno,
4 e.ename,
5 sum(distinct e.sal) over
6 (partition by e.deptno) as total_sal,
7 e.deptno,
8 sum(e.sal*case when eb.type = 1 then .1
9 when eb.type = 2 then .2
10 else .3 end) over
11 (partition by deptno) as total_bonus
12 from emp e, emp_bonus eb
13 where e.empno = eb.empno
14 and e.deptno = 10
15 ) x

Discussion

MySQL and PostgreSQL

The second query in the "Problem" section of this recipe joins table EMP and table EMP_BONUS and returns two rows for employee "MILLER", which is what causes the error on the sum of EMP.SAL (the salary is added twice). The solution is to simply sum the distinct EMP.SAL values that are returned by the query. The following query is an alternative solution. The sum of all salaries in department 10 is computed first and that row is then joined to table EMP, which is then joined to table EMP_BONUS. The following query works for all DBMSs:

 
select d.deptno,
d.total_sal,
sum(e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3 end) as total_bonus
from emp e,
emp_bonus eb,
(
select deptno, sum(sal) as total_sal
from emp
where deptno = 10
group by deptno
) d
where e.deptno = d.deptno
and e.empno = eb.empno
group by d.deptno,d.total_sal


DEPTNO TOTAL_SAL TOTAL_BONUS
--------- ---------- ------------
10 8750 2135

DB2, Oracle, and SQL Server

This alternative solution takes advantage of the window function SUM OVER. The following query is taken from lines 314 in "Solution" and returns the following result set:

 
select e.empno,
e.ename,
sum(distinct e.sal) over
(partition by e.deptno) as total_sal,
e.deptno,
sum(e.sal*case when eb.type = 1 then .1
when eb.type = 2 then .2
else .3 end) over
(partition by deptno) as total_bonus
from emp e, emp_bonus eb
where e.empno = eb.empno
and e.deptno = 10


EMPNO ENAME TOTAL_SAL DEPTNO TOTAL_BONUS
----- ---------- ---------- ------ -----------
7934 MILLER 8750 10 2135
7934 MILLER 8750 10 2135
7782 CLARK 8750 10 2135
7839 KING 8750 10 2135

The windowing function, SUM OVER, is called twice, first to compute the sum of the distinct salaries for the defined partition or group. In this case, the partition is DEPTNO 10 and the sum of the distinct salaries for DEPTNO 10 is 8750. The next call to SUM OVER computes the sum of the bonuses for the same defined partition. The final result set is produced by taking the distinct values for TOTAL_SAL, DEPTNO, and TOTAL_BONUS.


** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

Identifying and Avoiding Cartesian Products - SQL Server

Problem

You want to return the name of each employee in department 10 along with the location of the department. The following query is returning incorrect data:

 
select e.ename, d.loc
from emp e, dept d
where e.deptno = 10


ENAME LOC
---------- -------------
CLARK NEW YORK
CLARK DALLAS
CLARK CHICAGO
CLARK BOSTON
KING NEW YORK
KING DALLAS
KING CHICAGO
KING BOSTON
MILLER NEW YORK
MILLER DALLAS
MILLER CHICAGO
MILLER BOSTON

The correct result set is the following:

 ENAME       LOC
---------- ---------
CLARK NEW YORK
KING NEW YORK
MILLER NEW YORK

Solution

Use a join between the tables in the FROM clause to return the correct result set:

 1 select e.ename, d.loc
2 from emp e, dept d
3 where e.deptno = 10
4 and d.deptno = e.deptno

Discussion

Looking at the data in the DEPT table:

 
select * from dept


DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO

40 OPERATIONS BOSTON

You can see that department 10 is in New York, and thus you can know that returning employees with any location other than New York is incorrect. The number of rows returned by the incorrect query is the product of the cardinalities of the two tables in the FROM clause. In the original query, the filter on EMP for department 10 will result in three rows. Because there is no filter for DEPT, all four rows from DEPT are returned. Three multiplied by four is twelve, so the incorrect query returns twelve rows. Generally, to avoid a Cartesian product you would apply the n1 rule where n represents the number of tables in the FROM clause and n1 represents the minimum number of joins necessary to avoid a Cartesian product. Depending on what the keys and join columns in your tables are, you may very well need more than n1 joins, but n1 is a good place to start when writing queries.

TIPS :

When used properly, Cartesian products can be very useful. The recipe, , uses a Cartesian product and is used by many other queries. Common uses of Cartesian products include transposing or pivoting (and unpivoting) a result set, generating a sequence of values, and mimicking a loop.


** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table