Wednesday, June 11, 2008

Optimizer NOWORKLOAD statistics

I am often asked about statistics collection. People want to know the magic formula for collecting them. When? How much? But, the truth is there isn't one formula, magic or otherwise, that works for everyone. Think about it... If there were, Oracle would have already implemented it, don't you think? But hope is still alive that someday the mystery about it all will be revealed.

I think Oracle is getting better and better in its attempts to automate the process of statistics collection. But there continues to be alot of questions and discussion on how to "do it right" in the meantime.

So, I thought I'd start trying to put together some of the questions I've received and do a little research and testing to see if I can't answer some of them here in the blog. Today, I wanted to start with a few words about system statistics, particularly about noworkload statistics. Starting with 10g, noworkload statistics are always available. Even if you delete them, if the database restarts, then they are automatically regathered.

The noworkload statistics are basically a few generic benchmarks used as performance measures for your system. The stats that are maintained are:

CPUSPEEDNW
The number of operations per second (in millions) that a CPU can process.
IOSEEKTIM
Average number of milliseconds needed to find data on the disk (default 10).
IOTFRSPEED
Average number of bytes per millisecond that can be transferred from the disk (default 4,096).

I've never touched the noworkload statistics on my laptop test database so I started by checking them to see what they were (they should be the defaults from when my database was first started). I used dbms_stats.get_system_stats to retrieve the values.

SQL> @getsysstats
iotfrspeed: 4096
ioseektm : 10
cpuspeednw: 773.806

I then executed a single SQL statement and grabbed the execution plan. The statement was a 4 part UNION.


Then, I gathered noworkload statistics to see what they would be now. I started the collection and then executed several queries in another session while they were being collected.

SQL> exec dbms_stats.gather_system_statistics (gathering_mode => 'noworkload');

Using the gathering_mode parameter set to 'noworkload' will collect the values in about a minute. I'm not totally positive, but I believe the CPU speed is likely measured by executing some sort of calibration operation in a loop and the I/O is measured by doing different sized reads on several different datafiles.

These were my new values.

SQL> @getsysstats
iotfrspeed: 29126.045
ioseektm : 14.856
cpuspeednw: 917.075

So, I ran the same SQL and checked the plan and you can see the difference the new noworkload statistics made.



I notice two key things: 1) the cost for this plan went down, but the estimated %CPU went up. This makes sense if you look at the new statistics. The iotfrspeed was significantly larger than my original value. This would mean that the optimizer now thinks it is much less expensive to transfer data from the disk than it was previously and reduce the cost of those operations.

In my example, the optimizer still chose the same plan it originally chose, but now, in comparison to any other plan choices, this plan is even more likely to be selected due to the much lower cost.

Several people have commented to me that they've chosen not to use system stats in their 10 databases because they had trouble initially when upgrading from 9i. But, what they often don't realize is that they are using system statistics no matter what. They are simply using the noworkload variety is they disable/delete the others. And, as you can see in this simple example, noworkload stats can change and cause the optimizer's calculations to change as as result. This could mean plans change for apparently no reason. This is one of those cases when someone swears "nothing has changed" but then they tell me that the database had just been restarted for some reason. When that happened, the noworkload stats could've been recomputed. Even if that were the only thing that changed, it might be enough to cause some plans to break.

So, as I've mentioned before in a previous post: instead of insisting nothing has changed, evaluate what might have changed that might not be on your radar. Something is bound to have changed somewhere.

More stats stuff to come in later posts.

6 comments:

Noons said...

Good points, Karen.
This whole autostats thing was brought to me the hard way a while ago, when one of our Peoplesoft systems kept changing plans every time we re-started the db! Fixed now, but had us off-balance for a while...

On another note: shouldn't it be "Do what you like" at the top right?

Karen said...

Noons - I get afraid every time "auto" anything comes into the picture with stats collection. :)

Also, thanks for the catch on the phrasing. I can't believe I didn't catch that. It's fixed now! Thanks.

peter said...

Hi Karen,

So how often should I schedule the job to run?

thanks
- peter

Karen said...

Peter - Since the collection of system statistics should be based on workload, you only need to collect them when your workload changes. So, if you have a very typical workload all the time, you could just collect once and use them all the time. Many people will collect multiple sets of system stats and change them out as needed. For instance, you could have a set that matches the workload from 9am-5pm and another set that matches the workload from 5pm-9am (i.e. daytime transaction processing vs. nighttime batch processing). You could have different workload patterns on weekends or at month end. The key is to collect during a period of typical workload and then apply the stats for that period when appropriate.

So, the answer is..."it depends". :)

peter said...

Hi Karen,

Thanks for your tips. However, my question is more about collecting noworkload system stats. If I don't collect workload system stats, but I want to ensure my noworkload system stats is accurate. How often should I run the job?

Back to collecting workload system stats. On a system that is both an OLTP and a reporting system, how should workload stats be collected, especially when the workload is quite unpredictable? Thanks.

Karen said...

Peter - Well, noworkload stats are just that...no workload. You don't collect them...they are the defaults used by Oracle when you don't collect your own actual system statistics. So, the noworkload statistics that are present are intended to provide a base set of values but they won't be "accurate" to your system's activity.

Therefore, you should collect your own stats to ensure you've got an accurate picture of the workload on your system.

For, a system that is both an OLTP and a reporting system, it's a bit harder to get system stats that are precisely accurate for either side. If both types of transactions can be executed at the same time, then your best bet is to collect stats for a week or two (uninterrupted) and let it get a good, overall picture of all activity. That way, the stats will reflect the balance of both types of activities and while not precisely accurate for either, they will be "good enough" on the average for the optimizer to derive decent plans for both types of transactions.