Friday, October 17, 2008

Statistics musings

I've been thinking about a few problems that seem to turn up with Oracle query optimization lately. Jonathan Lewis posted on one case where the high value statistic was likely "off" at times and caused the plans to switch between good and bad. I had recently seen a similar case where a new partition was added each day and query plans that involved the new partition would pick the wrong index due to a high value problem. The stats were manually set to help fix it.

Anyway, these situations and several others I've seen have got me to wondering about whether or not using dynamic stats collections more aggressively would help some of these types of problems. I mean, if stats were missing and dynamic sampling kicked in, could/would better plans be derived more frequently?

I'm going to set up a set of tests to see where these musings take me, but I thought I'd mention it first to see if anyone has any experience with doing this and what kinds of results you've seen. If you have any input, let me hear from you!

5 comments:

Tom said...

I have this same exact issue. We get to the first of the month, load the empty partition with new data (finally, because it has been empty so long... it's time has come).

Once loaded, we then join it with several other partitions (and guess what, they were empty too!). So the optimizer, based on stats, assumes a cardinality of 1. Guess what? Nested looping all over the place. I see a lot of db_file_sequential reads. Logical reads all over the place. We then stop our batch process. Update some stats, then pick up where we left off (rerunning the query after flushing the shared_pool). Now we get hash joins and the query finishes in minutes instead of days.

Jonathan Lewis said...

Karen,

You might be interested in reading this thread from David Aldridge's blog:

http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/

Regards
Jonathan Lewis

robert said...

Karen, you could check over at the SQL Server camp. MS SQL does automatic stats sampling (you do not need a job set up) - they actually do it during all SELECT, UPDATE, INSERT and DELETE commands which seems pretty reasonable to do. From my experience this works pretty well (which does not mean that SQL Server does not have issues of its own).

A bit of background can be found in TechNet.

Kind regards

freekdhooge said...

Robert,

Dynamic sampling is wat Karen is talking about. Check http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i42991 to see how the oracle implementation works.

robert said...

freekdhooge,

thanks for the hint! Somehow I skipped the "dynamic sampling" bit and misread the posting as an invitation to think about how statistics collection could be improved.

I'd still say that it might be worthwhile if Oracle considered doing dynamic statistics updates in a similar fashion as MS SQL. For example, the outdated high value issue for date and timestamp columns with regular inserts could be resolved if during insert the high value would be updated. Granted, this would make inserts more costly so it would probably be good to have this optional or configurable on a per column basis.