Monday, October 25, 2010

Left outer join equivalence - Oracle vs ANSI syntax

I was reminded last week while visiting on customer site that it's easy to confuse the syntax you should use if you switch between Oracle's syntax and ANSI syntax for outer joins. Here's an example using the good old standby SCOTT schema to show what I mean. First, I'll show the data from both tables we'll use (emp and dept) so the rest of the examples will be easier to follow.


SCOTT@ORA11R2> select * from emp ;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10

14 rows selected.

SCOTT@ORA11R2> select * from dept ;

DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON

4 rows selected.


Now, let's join the tables to produce a result set that includes all rows from emp but limits the dept table info to only be included if the deptno = 20.

SCOTT@ORA11R2> select e.empno, e.ename, d.deptno, d.dname
2 from emp e, dept d
3 where e.deptno = d.deptno(+) and d.deptno(+) = 20;

EMPNO ENAME DEPTNO DNAME
---------- ---------- ---------- --------------
7902 FORD 20 RESEARCH
7876 ADAMS 20 RESEARCH
7788 SCOTT 20 RESEARCH
7566 JONES 20 RESEARCH
7369 SMITH 20 RESEARCH
7839 KING
7934 MILLER
7782 CLARK
7900 JAMES
7844 TURNER
7698 BLAKE
7654 MARTIN
7521 WARD
7499 ALLEN

14 rows selected.


Note that we get back the entire set of 14 rows from emp, but only 5 rows have the deptno and dname column values populated. This is because the filter on deptno(+) = 20 is written so that it is applied before the join between emp and dept occurs. Therefore, only rows with deptno = 20 will be joined to their matching emp rows. The remaining rows for employees who are in the other departments will be returned due to the outer join as no rows from dept will be available to match to.

If we wrote the join without including the (+) on the filter, we'd have gotten back only rows for deptno 20 employees as follows:

SCOTT@ORA11R2> select e.empno, e.ename, d.deptno, d.dname
2 from emp e, dept d
3 where e.deptno = d.deptno(+) and d.deptno = 20;

EMPNO ENAME DEPTNO DNAME
---------- ---------- ---------- --------------
7369 SMITH 20 RESEARCH
7566 JONES 20 RESEARCH
7788 SCOTT 20 RESEARCH
7876 ADAMS 20 RESEARCH
7902 FORD 20 RESEARCH

5 rows selected.


However, if we want to return rows only for deptno 20, then we shouldn't use an outer join at all. If the filter is applied after the join, then the rows for other departments will have null values in the deptno and dname columns (as shown earlier) and therefore will not match the filter condition and be eliminated from the result set. You can see that the result is the same if you remove the outer join operator.

SCOTT@ORA11R2> select e.empno, e.ename, d.deptno, d.dname
2 from emp e, dept d
3 where e.deptno = d.deptno
4 and d.deptno = 20;

EMPNO ENAME DEPTNO DNAME
---------- ---------- ---------- --------------
7369 SMITH 20 RESEARCH
7566 JONES 20 RESEARCH
7788 SCOTT 20 RESEARCH
7876 ADAMS 20 RESEARCH
7902 FORD 20 RESEARCH

5 rows selected.


When using ANSI join syntax, you need to make sure to use the filter condition properly as well. With ANSI syntax, the difference is in where you place the condition. In order for the filter to be applied before the join, you must place the filter inside the join condition.

SCOTT@ORA11R2> select e.empno, e.ename, d.deptno, d.dname
2 from emp e left outer join dept d
3 on (e.deptno = d.deptno and d.deptno = 20);

EMPNO ENAME DEPTNO DNAME
---------- ---------- ---------- --------------
7902 FORD 20 RESEARCH
7876 ADAMS 20 RESEARCH
7788 SCOTT 20 RESEARCH
7566 JONES 20 RESEARCH
7369 SMITH 20 RESEARCH
7839 KING
7934 MILLER
7782 CLARK
7900 JAMES
7844 TURNER
7698 BLAKE
7654 MARTIN
7521 WARD
7499 ALLEN

14 rows selected.


If you place the filter in the WHERE clause, the filter is applied after the join.

SCOTT@ORA11R2> select e.empno, e.ename, d.deptno, d.dname
2 from emp e left outer join dept d
3 on (e.deptno = d.deptno)
4 where d.deptno = 20;

EMPNO ENAME DEPTNO DNAME
---------- ---------- ---------- --------------
7369 SMITH 20 RESEARCH
7566 JONES 20 RESEARCH
7788 SCOTT 20 RESEARCH
7876 ADAMS 20 RESEARCH
7902 FORD 20 RESEARCH

5 rows selected.


I tend to use the Oracle syntax most often and am so used to it that when I was shown the ANSI syntax, it took a minute to recall the difference in placement of the filter condition (thanks for the nudge Karl!). It's fairly obvious when looking at a simple example like this as to where the condition belongs, but when you've got a much more complex SQL statement with multiple joins, the effects of placing the filter in the wrong spot may be overlooked.

Hope this little reminder helps!

2 comments:

Unknown said...

the clause where d.deptno = 20 changes the join to a regular join; you want where (d.deptno is null or d.deptno = 20)

Karen said...

@sablime, Adding the 'd.deptno is null' predicate doesn't provide the same result set as the correctly used outer join filter (d.deptno(+) = 20). If you use the predicate you suggest, 6 rows are returned. The 5 in deptno = 20 and the one row for employee KING whose deptno is null. We actually want our answer to return all rows in the emp table and include the department info for those employees in deptno 20. Simply adding the 'deptno is null' filter to the predicate won't produce that answer.

This approach was worth a shot though and supports another reason why it's important to always keep in mind the question you want the query to answer.