Sunday, June 29, 2008

Plan execution statistics

I recently saw a thread at AskTom where someone had asked a question about using tkprof for tuning a query. Tom provided an example of tracing the execution of a query and then using tkprof to summarize the trace file into a readable report and finally how to interpret what the report shows.

The part that caught my eye (and the eye of another reader) was the execution plan output display in the report. It looked like this:

Rows Row Source Operation
--------- ---------------------------------------------------
1 SORT AGGREGATE (cr=1840797 pr=1840758 pw=0 time=262104893 us)
128000000 TABLE ACCESS FULL BIG_TABLE (cr=1840797 pr=1840758 pw=0 time=384004887 us)

Note the statistics (in parentheses) for each row source operation. The time shown for the first step is time=262104893. This is also close to the approximate total elapsed time shown for the query (~262 seconds). But, look at the second step which is a child of step 1. It shows a time=384004887 (~384 seconds). How can that be? Child steps roll their totals into their parents so that the parent totals are all their children plus themselves. So, there is no way the child took around 384 seconds but the parent took 262 seconds. Tom offered an explanation of why the numbers were different that basically shows how where instrumentation calls are placed can cause measurement errors to creep in.

Well, while I know measurement errors do exist, I think the specific problem here isn't related to any type of measurement error. It has to do with the setting of the STATISTICS_LEVEL parameter. The STATISTICS_LEVEL parameter is set to TYPICAL by deafult, and at the default level, plan execution statistics are DISABLED. You can see this with these two queries:

SQL> select value from v$parameter where upper(name) = 'STATISTICS_LEVEL' ;


1 row selected.

SQL> SELECT session_status,
2 system_status,
3 activation_level,
4 session_settable
5 FROM v$statistics_level
6 WHERE statistics_name = 'Plan Execution Statistics'
7 /

Session System Activation Session
Status Status Level Settable
---------- ---------- ---------- ----------

1 row selected.

The first query shows my current parameter setting (TYPICAL) and the second shows what setting the plan execution statistics have at this level (DISABLED). The second query also shows that the activation level for collecting this information is at ALL.

So, I recreated the same test as Tom used and here's my plan display (note the time= values are "off", just like Tom's):

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=120 pr=0 pw=0 time=11988 us)
28736 INDEX FAST FULL SCAN BIG_TAB_SO_IDX (cr=120 pr=0 pw=0 time=287550 us)(object id 52645)

Next, I executed ALTER SESSION SET STATISTICS_LEVEL = ALL and ran the test again. Now look what I get:

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=120 pr=113 pw=0 time=148169 us)
28736 INDEX FAST FULL SCAN BIG_TAB_SO_IDX (cr=120 pr=113 pw=0 time=93170 us)(object id 52645)

After setting the STATISTICS_LEVEL parameter to ALL, my child time is less than the parent indicating that the parent appears to have correctly rolled up the child into its totals. Therefore, if you want completely correct plan execution statistics, you must have the STATISTICS_LEVEL parameter set to ALL. Otherwise, the only number that is correct is the topmost step in the plan (id=1).

I think the plan execution statistics are extremely helpful when evaluating query performance, so I like to have my parameter set to ALL for session testing.

However, I'd like to know why Oracle doesn't just show the child steps with null values if they are not correct. Notice that the cr and pr values seem to be correct in both cases in my example, but I think that's coincidence. I recall seeing other cases where this hasn't been true (at least in early 10 releases).

What I'd like to know is what is going on under the covers? But, lacking that knowledge, I've learned to ignore all the row source operation execution statistics except for the topmost one if the parameter isn't set to ALL.


Kurt Graustein said...

Hi Karen. Sorry I didn't make it to the Symposium this past Spring for our annual visit.

I believe you will find a lot of answers on the statistics inconsistency between parent and child row source stats if you do some research on the hidden parameter _rowsource_statistics_sampfreq. While you might not find to much official documentation on it, both Jonathan and Julian have some content on the topic. In short, the statistics are gathered based on sampling the row source operations. This parameter controls the granularity of the sampling. Lowering the granularity can have an adverse effect on the performance of statements, so handle with care.

Karen said...

Thanks for the info Kurt. I'll certainly look into it in more depth.

If it's this parameter that is making the rowsource stats to not add up properly to the top parent, I guess I still don't understand why they sample at all since it adds no value (as far as I can tell). It seems more confusing than anything.

Again, thanks!