Monday, September 22, 2008

STAT lines in 11g

My friend and colleague, Ric Van Dyke, has been preparing a presentation for the UKOUG conference and in his work discovered an anomaly in the roll ups for the time values in the STAT lines emitted in 10046 trace data from an 11g database. I had seen this behavior as well but didn't follow up on it for myself since I wasn't using Oracle 11 for anything other than a bit of testing here and there.

A couple of weeks ago, I received an email from Rudi K. as a follow up to my post on Plan Execution Statistics. In some of Rudi's own testing, he came across this anomaly in the rowsource execution statistics as well and asked me about it. So, given Rudi's question and a recent discussion with Ric on the subject, I thought I'd blog about it. Rudi - I apologize it's taken me so long to properly respond to you, but here you go! :)

First, let's look at an example of what the problem looks like. Take a look at the rowsource execution statistics (from the trace file STAT lines) for the following very simple query in Oracle 10g:


select /*+ use_merge(e,d) */ *
from emp e, dept d
where e.deptno = d.deptno ;

STAT #3 id=1 cnt=13 pid=0 pos=1 obj=0 op='MERGE JOIN (cr=11 pr=8 pw=0 time=21096 us)'
STAT #3 id=2 cnt=4 pid=1 pos=1 obj=54389 op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=2 pw=0 time=17342 us)'
STAT #3 id=3 cnt=4 pid=2 pos=1 obj=54390 op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=1 pw=0 time=16060 us)'
STAT #3 id=4 cnt=13 pid=1 pos=2 obj=0 op='SORT JOIN (cr=7 pr=6 pw=0 time=3539 us)'
STAT #3 id=5 cnt=13 pid=4 pos=1 obj=54391 op='TABLE ACCESS FULL EMP (cr=7 pr=6 pw=0 time=3284 us)'

In order to understand the rowsource execution statistics (the information in the parentheses at the end of each line), you need to follow one simple "rule": the values listed in a child operation roll up into their parents. The bottom-line is that the top line should be the roll up of the child operations listed below it. To understand the relationships, take a look at this tree diagram of this query's execution plan.

1
|
-------
| |
2 4
| |
3 5

So, you could break down the rowsource time stats as follows (I'm only showing the time stats as these are the roll ups that appear to have problems in 11g but all the rowsource statistics roll up the same way):

-------------------------------------------------------------------------------------
| Id | Pid | Operation | Name | Time | Self | Kids
-------------------------------------------------------------------------------------
| 0 | | SELECT STATEMENT | | 21096 | |
| 1 | 0 | MERGE JOIN | | 21096 | 215 | 17342+3539
| 2 | 1 | TABLE ACCESS BY INDEX ROWID| DEPT | 17342 | 1282 | 16060
| 3 | 2 | INDEX FULL SCAN | DEPT_DEPTNO_PK | 16060 | 16060 | 0
| 4 | 1 | SORT JOIN | | 3539 | 255 | 3284
| 5 | 4 | TABLE ACCESS FULL | EMP | 3284 | 3284 | 0
-------------------------------------------------------------------------------------


So far, so good. Now, let's look at the same info from 11g.

STAT #3 id=1 cnt=13 pid=0 pos=1 obj=0 op='MERGE JOIN (cr=6 pr=3 pw=3 time=166 us cost=5 size=767 card=13)'
STAT #3 id=2 cnt=4 pid=1 pos=1 obj=71193 op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=2 pw=2 time=53 us cost=2 size=80 card=4)'
STAT #3 id=3 cnt=4 pid=2 pos=1 obj=71194 op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=1 pw=1 time=36 us cost=1 size=0 card=4)'
STAT #3 id=4 cnt=13 pid=1 pos=2 obj=0 op='SORT JOIN (cr=2 pr=1 pw=1 time=25 us cost=3 size=507 card=13)'
STAT #3 id=5 cnt=13 pid=4 pos=1 obj=71195 op='TABLE ACCESS BY INDEX ROWID EMP (cr=2 pr=1 pw=1 time=84 us cost=2 size=507 card=13)'
STAT #3 id=6 cnt=13 pid=5 pos=1 obj=71200 op='INDEX FULL SCAN EMP_DEPT_IDX (cr=1 pr=1 pw=1 time=29 us cost=1 size=0 card=13)'

The plan is a bit different in that it used an INDEX FULL SCAN for the EMP table access but otherwise, it's basically the same.

1
|
-------
| |
2 4
| |
3 5
|
6

Here's the break down of the rowsource time stats:

--------------------------------------------------------------------------------------
| Id | Pid | Operation | Name | Time | Self | Kids
--------------------------------------------------------------------------------------
| 0 | | SELECT STATEMENT | | 166 | |
| 1 | 0 | MERGE JOIN | | 166 | ?? | 53+??
| 2 | 1 | TABLE ACCESS BY INDEX ROWID | DEPT | 53 | 17 | 36
| 3 | 2 | INDEX FULL SCAN | DEPT_DEPTNO_PK | 36 | 36 | 0
| 4 | 1 | SORT JOIN | | 25 | ?? | ??
| 5 | 4 | TABLE ACCESS BY INDEX ROWID| EMP | 84 | 55 | 29
| 6 | 5 | INDEX FULL SCAN | EMP_DEPT_IDX | 29 | 29 | 0
--------------------------------------------------------------------------------------

Hmmmm... Something's wrong here. Notice on step 4, the SORT JOIN operation's total time is only 25 microseconds. How can that be possible if it's child operation takes 84 microseconds? The MERGE JOIN step shows a time of 166 microseconds. How can I be sure of that total since step 4 appears wrong (i.e. it doesn't appear to include it's child operation totals)?

Well, if we go with Ric's theory, the problem is with the SORT step (id = 4). It appears that Oracle is no longer rolling up totals when a SORT type of operation occurs. Instead, it maintains that operation's statistics independent of its children. If that's the case, then to get the correct timing roll up, we'd need to add the time from steps 2, 4 and 5. That would be 162 microseconds. That would mean that the MERGE step would take 4 microseconds.

I'm not sure that's really how it's working, but the math seems to work out fairly close. Ric says:
What appears to be happening is that the SORT is now done "while" the child steps are accruing. I think the best way to describe this new behavior is that sorts now happen simultaneously with the child steps. I don't believe that this is a parallel action, it is an action that happens simultaneously. There is a subtle but distinct difference.

Regardless of if there's a resolution or not (perhaps we need to change the way the roll ups are done for SORT type operations which include not only SORT-MERGE joins, but queries that have ORDER BY and HASH joins as well), we do need to take note of the change. I've depended on these rowsource stats quite frequently when optimizing SQL and would love to really understand why this behavior is occurring and how to account for it in trace data.

If anyone has anything more concrete on this, I'd love to hear from you and I'm sure Ric would too!

6 comments:

mathewbutler said...

I've a vague recollection of reading about the same phenomena in the last month or so. With the same conclusion drawn.

I'm fairly sure it was a post by
Jonathan Lewis. I've had a quick search though but can't find the link.

If I find it, I'll post another comment.

Regards,

Karen said...

Mathew - Same here. I've looked all over for commentary that I thought I'd read before, but can't seem to locate anything specific to this case. Jonathan and others have written about the _rowsource_statistics_sampfreq parameter but my testing doesn't show that changing that parameter makes the time roll ups change. The statistics_level parameter is set to ALL for my tests but I have also tested with it set to TYPICAL and gotten basically the same results. So, I can't prove that it's a sampling frequency issue.

I'd appreciate the link if you find anything. Thanks!

Anonymous said...

Thank you Karen.


Mathew,
Jonathan post related to this timing stat,

http://jonathanlewis.wordpress.com/2007/11/25/gather_plan_statistics/
http://jonathanlewis.wordpress.com/2007/04/26/heisenberg/

Just guess if both article is what you mean. But they are different with what Karen explaining here

Anonymous said...

Karen,

Ric sent me some notes about this a little while ago. I don't recall what comments I made to him then, but seeing your post, I've just run up a couple of little tests on 11.1.0.6 with statistics_level = all and tracing enabled.

From what I've just seen, I think this is a bug in the sql_trace code that is losing the sort line time values then making nonsense of any attempt to accumulate through a sort (pointer offset error, perhaps). To fit your comments page I've trimmed some bits from a trace file, and from v$sql_plan_statistics_all to show the same query from two different perspectives.

I've tried to use a couple of tags that work on my blog to get a fixed font, apologies if the results look unreadable.



Execution path:

SORT AGGREGATE
_MERGE JOIN
__TABLE ACCESS BY INDEX ROWID T1
___INDEX FULL SCAN T1_I1
_SORT JOIN
__TABLE ACCESS BY INDEX ROWID T2
___INDEX FULL SCAN T2_I2



Times from trace file:

time= 0 us
time= 33666 us
time= 11539 us
time= 2320 us
time= 0 us
time=136144 us
time= 28985 us


Times (last_time_elapsed) from v$sql_plan_statistics_all

ID LAST_ELAPSED_TIME
---------- -----------------
1 226831
2 224014
3 11110
4 2061
5 198785
6 130050
7 20022

As you can see, the trace file allocates zero to both the sort operations, and the merge operation time is garbage. However, the figures from v$sql_plan_statistics_all are self-consistent.

Regards
Jonathan Lewis

Karen said...

Thanks Jonathan. I didn't even think to compare the two sets of values and just assumed they were the same. You gave me a good reminder to *never* assume anything.

I think the question ultimately comes down to whether or not, it's a problem worth pursuing. I mean, with the overhead associated with setting statistics_level to ALL, I'm not sure having these stats (even when they're correct) is worth the cost of getting them.

RG said...

Karen, Jonathan:

No sarcasm: Any luck with opening a service request with Oracle on this subject? One would hope that an SR from contributors to the Oracle user community such as yourselves get promptly escalated to the appropriate technology group.