A couple of weeks ago, I received an email from Rudi K. as a follow up to my post on Plan Execution Statistics. In some of Rudi's own testing, he came across this anomaly in the rowsource execution statistics as well and asked me about it. So, given Rudi's question and a recent discussion with Ric on the subject, I thought I'd blog about it. Rudi - I apologize it's taken me so long to properly respond to you, but here you go! :)
First, let's look at an example of what the problem looks like. Take a look at the rowsource execution statistics (from the trace file STAT lines) for the following very simple query in Oracle 10g:
select /*+ use_merge(e,d) */ *
from emp e, dept d
where e.deptno = d.deptno ;
STAT #3 id=1 cnt=13 pid=0 pos=1 obj=0 op='MERGE JOIN (cr=11 pr=8 pw=0 time=21096 us)'
STAT #3 id=2 cnt=4 pid=1 pos=1 obj=54389 op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=2 pw=0 time=17342 us)'
STAT #3 id=3 cnt=4 pid=2 pos=1 obj=54390 op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=1 pw=0 time=16060 us)'
STAT #3 id=4 cnt=13 pid=1 pos=2 obj=0 op='SORT JOIN (cr=7 pr=6 pw=0 time=3539 us)'
STAT #3 id=5 cnt=13 pid=4 pos=1 obj=54391 op='TABLE ACCESS FULL EMP (cr=7 pr=6 pw=0 time=3284 us)'
In order to understand the rowsource execution statistics (the information in the parentheses at the end of each line), you need to follow one simple "rule": the values listed in a child operation roll up into their parents. The bottom-line is that the top line should be the roll up of the child operations listed below it. To understand the relationships, take a look at this tree diagram of this query's execution plan.
1
|
-------
| |
2 4
| |
3 5
So, you could break down the rowsource time stats as follows (I'm only showing the time stats as these are the roll ups that appear to have problems in 11g but all the rowsource statistics roll up the same way):
-------------------------------------------------------------------------------------
| Id | Pid | Operation | Name | Time | Self | Kids
-------------------------------------------------------------------------------------
| 0 | | SELECT STATEMENT | | 21096 | |
| 1 | 0 | MERGE JOIN | | 21096 | 215 | 17342+3539
| 2 | 1 | TABLE ACCESS BY INDEX ROWID| DEPT | 17342 | 1282 | 16060
| 3 | 2 | INDEX FULL SCAN | DEPT_DEPTNO_PK | 16060 | 16060 | 0
| 4 | 1 | SORT JOIN | | 3539 | 255 | 3284
| 5 | 4 | TABLE ACCESS FULL | EMP | 3284 | 3284 | 0
-------------------------------------------------------------------------------------
So far, so good. Now, let's look at the same info from 11g.
STAT #3 id=1 cnt=13 pid=0 pos=1 obj=0 op='MERGE JOIN (cr=6 pr=3 pw=3 time=166 us cost=5 size=767 card=13)'
STAT #3 id=2 cnt=4 pid=1 pos=1 obj=71193 op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=2 pw=2 time=53 us cost=2 size=80 card=4)'
STAT #3 id=3 cnt=4 pid=2 pos=1 obj=71194 op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=1 pw=1 time=36 us cost=1 size=0 card=4)'
STAT #3 id=4 cnt=13 pid=1 pos=2 obj=0 op='SORT JOIN (cr=2 pr=1 pw=1 time=25 us cost=3 size=507 card=13)'
STAT #3 id=5 cnt=13 pid=4 pos=1 obj=71195 op='TABLE ACCESS BY INDEX ROWID EMP (cr=2 pr=1 pw=1 time=84 us cost=2 size=507 card=13)'
STAT #3 id=6 cnt=13 pid=5 pos=1 obj=71200 op='INDEX FULL SCAN EMP_DEPT_IDX (cr=1 pr=1 pw=1 time=29 us cost=1 size=0 card=13)'
The plan is a bit different in that it used an INDEX FULL SCAN for the EMP table access but otherwise, it's basically the same.
1
|
-------
| |
2 4
| |
3 5
|
6
Here's the break down of the rowsource time stats:
--------------------------------------------------------------------------------------
| Id | Pid | Operation | Name | Time | Self | Kids
--------------------------------------------------------------------------------------
| 0 | | SELECT STATEMENT | | 166 | |
| 1 | 0 | MERGE JOIN | | 166 | ?? | 53+??
| 2 | 1 | TABLE ACCESS BY INDEX ROWID | DEPT | 53 | 17 | 36
| 3 | 2 | INDEX FULL SCAN | DEPT_DEPTNO_PK | 36 | 36 | 0
| 4 | 1 | SORT JOIN | | 25 | ?? | ??
| 5 | 4 | TABLE ACCESS BY INDEX ROWID| EMP | 84 | 55 | 29
| 6 | 5 | INDEX FULL SCAN | EMP_DEPT_IDX | 29 | 29 | 0
--------------------------------------------------------------------------------------
Hmmmm... Something's wrong here. Notice on step 4, the SORT JOIN operation's total time is only 25 microseconds. How can that be possible if it's child operation takes 84 microseconds? The MERGE JOIN step shows a time of 166 microseconds. How can I be sure of that total since step 4 appears wrong (i.e. it doesn't appear to include it's child operation totals)?
Well, if we go with Ric's theory, the problem is with the SORT step (id = 4). It appears that Oracle is no longer rolling up totals when a SORT type of operation occurs. Instead, it maintains that operation's statistics independent of its children. If that's the case, then to get the correct timing roll up, we'd need to add the time from steps 2, 4 and 5. That would be 162 microseconds. That would mean that the MERGE step would take 4 microseconds.
I'm not sure that's really how it's working, but the math seems to work out fairly close. Ric says:
What appears to be happening is that the SORT is now done "while" the child steps are accruing. I think the best way to describe this new behavior is that sorts now happen simultaneously with the child steps. I don't believe that this is a parallel action, it is an action that happens simultaneously. There is a subtle but distinct difference.
Regardless of if there's a resolution or not (perhaps we need to change the way the roll ups are done for SORT type operations which include not only SORT-MERGE joins, but queries that have ORDER BY and HASH joins as well), we do need to take note of the change. I've depended on these rowsource stats quite frequently when optimizing SQL and would love to really understand why this behavior is occurring and how to account for it in trace data.
If anyone has anything more concrete on this, I'd love to hear from you and I'm sure Ric would too!