Karen Morton: 2008

Monday, December 29, 2008

AND-EQUAL vs BITMAP AND

My friend and colleague, Jan, sends me some great SQL performance issues from time to time. He and I had an email exchange around the Christmas holiday that was another interesting case so I thought I'd share it.

First a bit of background info. The problem query is running under both 9i and 10g database versions. On 9i, the query runs in about .5 secs and uses a AND-EQUAL operation. However, on version 10, the optimizer chooses a BITMAP AND operation and converts the b-tree indexes to bitmaps first. This is where the problem lies. The v10 optimizer choice to use the BITMAP AND operation increases the response time for the query to almost 1 minute (56.5 secs).

I recall that 10g deprecated the AND-EQUAL operation and will now choose this "index combine" operation using bitmap conversions instead most of the time. However, I also recall having read about performance issues occurring at times with this choice. I've seen a couple of tricks to work-around this problem. One is to set the optimizer_features_enable parameter to 9.2.0 in the session before executing the query (many times with a logon trigger). The other would be to split up the query into two queries using UNION ALL to put them together.

As you'll see below in the details, in v10 the majority of the time is spent in the plan step that converts the rowids of the b-tree index to bitmaps. But, the part that I'm not sure of (and haven't built a test case yet for review) is why the INDEX RANGE SCAN that provides the rowids to the bitmap conversion accesses so many more blocks than the range scan for the AND-EQUAL operation.

Now for the details:


select  count(item_t.poid_id0)
from    item_t                           -- num_rows 5,095,494
where 1=1
and   item_t.ar_billinfo_obj_ID0 = :1    -- num_distinct 599,778
and   item_t.ar_bill_obj_ID0     = :2    -- num_distinct 3,155,282

Bind variable values:  
:1 = 53
:2 = 10

The query under 9i uses AND-EQUAL as shown in the STAT lines from collected trace data:


STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=79 pr=23 pw=0 time=514521 us)'
STAT #2 id=2 cnt=6 pid=1 pos=1 obj=352589 op='TABLE ACCESS BY INDEX ROWID ITEM_T (cr=79 pr=23 pw=0 time=162835 us)'
STAT #2 id=3 cnt=6 pid=2 pos=1 obj=0 op='AND-EQUAL (cr=76 pr=20 pw=0 time=142399 us)'
STAT #2 id=4 cnt=21 pid=3 pos=1 obj=368525 op='INDEX RANGE SCAN I_ITEM_AR_BNFO_OBJ__ID (cr=36 pr=1 pw=0 time=13924 us)'
STAT #2 id=5 cnt=16 pid=3 pos=2 obj=368532 op='INDEX RANGE SCAN I_ITEM_AR_BILL_OBJ__ID (cr=40 pr=19 pw=0 time=101036 us)'

Under 10g, using default settings the query uses a BITMAP AND plan:


STAT #2 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=9688 pr=9519 pw=0 time=56554617 us)'
STAT #2 id=2 cnt=6 pid=1 pos=1 obj=352589 op='TABLE ACCESS BY INDEX ROWID ITEM_T (cr=9688 pr=9519 pw=0 time=56554575 us)'
STAT #2 id=3 cnt=6 pid=2 pos=1 obj=0 op='BITMAP CONVERSION TO ROWIDS (cr=9685 pr=9519 pw=0 time=56554533 us)'
STAT #2 id=4 cnt=1 pid=3 pos=1 obj=0 op='BITMAP AND (cr=9685 pr=9519 pw=0 time=56554480 us)'
STAT #2 id=5 cnt=1 pid=4 pos=1 obj=0 op='BITMAP CONVERSION FROM ROWIDS (cr=3 pr=0 pw=0 time=128 us)'
STAT #2 id=6 cnt=22 pid=5 pos=1 obj=368525 op='INDEX RANGE SCAN I_ITEM_AR_BNFO_OBJ__ID (cr=3 pr=0 pw=0 time=90 us)'
STAT #2 id=7 cnt=6 pid=4 pos=2 obj=0 op='BITMAP CONVERSION FROM ROWIDS (cr=9682 pr=9519 pw=0 time=46814337 us)'
STAT #2 id=8 cnt=530398 pid=7 pos=1 obj=368532 op='INDEX RANGE SCAN I_ITEM_AR_BILL_OBJ__ID (cr=9682 pr=9519 pw=0 time=67896016 us)'

Notice how both of them use INDEX RANGE SCAN I_ITEM_AR_BILL_OBJ__ID, but the bitmap plan is doing 9682 cr against 40 for the and-equal. Why would that be the case?

My guess for the increased cr's comes from seeing cnt=530398 in step 8 of the bitmap plan where the index range scan occurs. That step is returning over half a million rows from the index that then are converted by the parent step 7. The parent ends up with cnt=6. What that says to me is that the conversion happened for the half million rowids and then were filtered to end up with only 6. But in the and-equal plan, the index range scan only returned cnt=16. I can see the behavior, my question is why? What's really going on? If any of you have the answer to that question, I'd appreciate hearing from you. I promise to test it for myself too! ;)

Friday, December 19, 2008

Performance optimization (or not) on a snowy day

A friend sent me this and it totally cracked me up. It's an excellent example of how working on the wrong thing is a bad idea. It doesn't matter how good you think the results of your efforts are, if you've been working on the wrong thing, you've just wasted a lot of time and effort. This also goes to show you that optimizing performance is not just a database challenge, but an every day challenge!

Enjoy!

Monday, December 1, 2008

Unsubscribe

Today I tried to unsubscribe myself from a mailing list that I'm not sure how I got on. I'm typically very careful about where I use my main work and personal email addresses so as to keep my Inbox from being flooded with junk email. But, I still occasionally find myself receiving emails that I just don't want to receive.

This was the case today when I attempted to unsubscribe from one such list. I clicked on the Unsubscribe link provided in the email and was taken to a page that showed my email address and a list of about 30 different mailing lists that I was (or could be) subscribed to. Out of the full list, two check boxes were marked. Since I didn't know about the other one that was marked, I thought this was good fortune that I could now unsubscribe to both on this one screen. So, I unchecked the boxes and selected the radio button at the bottom of the page that was labeled "Please remove me from all the above lists" and clicked Update.

I was a bit disturbed that I didn't get any confirmation message, but closed the browser and went about my business. Less than 5 minutes later, I got a rush of about 30 emails in a row. Yes, that's right...30. Guess where they were from? They were all email lists that were on the list where I'd just requested to be removed from all lists.

Well, that's precisely the opposite behavior that I wanted and expected. Instead of being removed from all lists, it appeared that I had been added. After a moment of thought, I decided that perhaps I had uncovered a bug in the unsubscribe program and created the problem myself by unchecking the boxes before I selected the "remove from all" option. I'm not happy at this point, but OK...I can fix this...I'll just unsubscribe again.

So, I click on the Unsubscribe link in the first email of the 30 I'd just received. This time I'm presented with an entirely different list of about 30 other mailing lists with the difference now being that the one list I want to unsubscribe from is one of the 30. Crap! I'm now thinking that means that I may actually have to go to all 30 emails and unsubscribe individually from each. Sigh...

This time, I leave the box beside the list I just received an email from and notice that at the bottom, there is another choice (besides the remove from all lists one). It says "Please remove me from all lists affiliated with the sender". I'm a bit afraid after my previous experience but I think "hey...maybe this will save me from having to go to all of the 29 other emails and unsubscribe there". So, I choose that option and click Update. Once again, I do *not* get a confirmation as to what changes, if any, were made from my update request. So, I decide to just keep the browser window open for a while and wait and see if I get another rush of emails.

After 10 minutes pass, I'm starting to feel pretty happy that I'd safely unsubscribed and closed the browser window I'd kept open just in case. I also deleted the 29 other emails I'd received. It's lunch time so I leave my computer for about 45 minutes.

Upon my return from my lunch, I notice the unread email count in my Inbox says 67. What the heck? As I looked over the list of emails, they were all emails requesting that I "click on this link" to confirm my request to unsubscribe to that particular list. I'm usually a very patient and easy-going person, but this is just too much for me. If I had access to whoever was sending me on this wild goose chase, I'd throttle them! I'm NOT happy. As a matter of fact, I'm royally hacked off! All I wanted to do was unsubscribe to one list and now I've been subscribed to a boatload more that all want me to confirm my desire to unsubscribe. Crap, crap, crap!

I swear to you that if I had clicked on the link to confirm my choice to unsubscribe and had been taken to that same screen again, I would've lost it. Fortunately for me (and my nearby possessions which would likely have suffered from my wrath), I was simply taken to a screen that showed my email address and a message saying that my email address would be removed from the list within 48 hours.

I had to do this 65 times (the other two emails in my Inbox were actually real business stuff!). I got through all this mess in less than an hour of my time, but that's nearly an hour out of my day that could've been much more productively spent! I wish I could bill whoever these folks were behind this fiasco for my lost time. Geesh!

The bottom-line here is that the whole process was absolutely not warranted. The list holder should've been able to unsubscribe me immediately from the original link in the original email. Unsubscribe means unsubscribe! Don't ask me about 30 other lists or if I want to subscribe to others, or be removed from all, or any of that stuff. Just take my address OFF the stupid list I've just clicked on the link about. What's so hard about that?!? And, why oh why, in this day and age, do I have to wait up to 48 hours before you will take me off your lists?! I get the whole distributed processing thing, but come on people! If they can send out new, "welcome to our list" emails almost immediately to new subscribers, these folks can certainly take my email address out of their system equally as fast. Besides, who's going to be more upset? Me, who wants off the stupid list but keeps getting emails after I've requested to be unsubscribed, or a new subscriber who has never received anything from you yet at all and has to wait a day or two?

I don't know about you all, but I'm sick and tired of being subscribed to lists I've never heard of. There are a few sites from which I have requested to be notified, but there are multitudes more that I care absolutely nothing about. All I feel I can do is scream out to the universe to STOP THE MADNESS! If you hear a sonic boom coming from the Pacific Northwest in the near future, it may just be me...imploding after having received another unwanted email.

In the meantime, I'm just doing some deep breathing exercises and trying to stop my mental thrashing of this particular list vendor. Surely it can't be good for my blood pressure. Sigh....

Friday, November 21, 2008

The Charles Schultz Philosophy

The following is a philosophy of life attributed to Charles Schulz, the creator of the 'Peanuts' comic strip. You don't have to actually answer the questions to get the point. Just ponder on them.

1. Name the five wealthiest people in the world.
2. Name the last five Heisman trophy winners.
3. Name the last five winners of the Miss America pageant.
4. Name ten people who have won the Nobel or Pulitzer Prize.
5. Name the last half dozen Academy Award winners for best actor and actress.
6. Name the last decade's worth of World Series winners.

How did you do? The point is, few of us remember the headliners of yesterday. These are no second-rate achievers. They are the best in their fields. But the applause dies. Awards tarnish. Achievements are forgotten. Accolades and certificates are buried with their owners.

Here's another set of questions. See how you do with these:

1. List a few teachers who aided your journey through school.
2. Name three friends who have helped you through a difficult time.
3. Name five people who have taught you something worthwhile.
4. Think of a few people who have made you feel appreciated and special.
5. Think of five people you enjoy spending time with.

Easier?

The lesson:
The people who make a difference in your life are not the ones with the most credentials...the most money...or the most awards. They simply are the ones who care the most.

For me, I thought this was something good to ponder on the weekend before Thanksgiving. I realized how easy it was for me to answer the second set of questions and how many people have touched my life. There have been family, friends, teachers, mentors, and colleagues who have given me something of themselves. I feel very, very thankful and my life is richer.

So, what are you thankful for?

Monday, November 17, 2008

One last hurrah on database independence

I had mentioned in my previous post on database independence about Toon Koppelaars presentation on a database centric approach to application development. I contacted him and he said he was going to post his presentation on his site soon. But in the meantime, I thought I'd show one of his slides which I think speaks volumes to the performance aspects of trying to keep with a database independent approach.

The slide shows:
a) If you put business logic in the mid tier, you’ll call the dbms an order of magnitude more.
b) If you also put data logic in the mid tier, you’ll call the dbms two orders of magnitude more.
c) Finally, if you also don’t use SQL effectively, you’ll call the dbms three orders of magnitudes more.

So, a single click on a "Save" button by the user can ultimately mean 100 calls the dbms has to execute to handle all the middle tier madness!

You should be able to download Toon's full presentation from RuleGen.com very soon. Give it a look...it's good stuff.

Wednesday, November 12, 2008

Database Independence?!

OK. Call me dense. Call me biased. Call me performance conscious. But I just can't groove on the concept of database independence. You know what I mean? The whole idea of having all your application logic built into your application and the database does basically nothing...well, it just doesn't make sense. Even after hearing all the arguments, I just don't get it.

There are so many optimizations that can't be utilized when the database isn't involved. The database ends up serving back boatloads of "simple" query results (usually max of 2-3 tables per query and mostly single table queries) and the application does everything else. The application basically processes the joins, applies the constraints...everything. What would amount to simple work that could occur in minimal time for Oracle, amounts to seconds of time processing the data at the application layer. Heck...if you're gonna do that, why not just have flat files to store your data and skip the database all together!? It'd sure save you a bundle on Oracle license fees!

Toon Koppelaars recently delivered a presentation at Miracle Oracle Open World that I attended that was about a database centric approach. I hope he'll post it somewhere soon for "public consumption" as it makes an extremely strong argument for going towards a "fat" database and getting the bulk of logic and function out of the application layer. Amen brother!

I'm ranting, so I'll stop. But, I'd love to hear from you about your thoughts. I'm an open-minded kinda gal, but when it comes down to it, I find all the reasons people share with me about why they absolutely have to write database independent code and put all the code in their application layer to be just a bunch of excuses. They are excuses that cover up 1) lack of sufficient knowledge of the databases they want to use and 2) a preference for what they think is easy. I mean, when you have a stable full of developers who can code Java fast and furious, but have been taught to not care about the database, designing for a database centric approach just isn't the forte...so it doesn't happen. But, when performance problems start to occur, they want to blame the database. Again...I just don't get it.

Can anyone help me out here?

Saturday, November 1, 2008

Recent Presentations

I've posted my most recent presentations from Miracle Oracle Open World in Denmark and the OPP2008 (Oracle PL/SQL Programming Conference) in Chicago to SlideShare. By the way, SlideShare is pretty cool if you've never seen or used it. It's a great way to easily share presentations. You can also add audio to create SlideCasts. I'm thinking of adding an audio track over the presentations I've already uploaded, but that'll have to wait until I have more time to do so. So much to do...so little time!

I've also added a link to my blog sidebar that will access all my presentations I upload to SlideShare. So, after this post ages off my main blog page, it'll be easy to find them from the sidebar link. I plan to post all my presentations there after I've delivered them.

To access the presentation you want, just choose the name of the presentation on the right side and it will display in the view area on the left. Happy viewing!

| Get your Presentation Pack

Wednesday, October 29, 2008

Miracle Oracle Open World

My first trip to Denmark was most excellent. The good folks at Miracle were great hosts and the event was unique in many ways. I delivered two presentations and heard several others that were superb. Jonathan Lewis delivered an excellent session (well, 2 actually) on hints and Tanel Poder's presentation on advanced troubleshooting was top notch. My only regret is that I didn't get to see all the presentations. It never fails that when I speak at a conference there is a session being delivered that I want to see at the same time as my presentation. That was once again true. I missed Robyn Sands presentation on variance, but was lucky enough to be able to chat with her about it a bit over dinner.

One of the things that I found unique about the Miracle event was the amount of time available to network, meet new people and talk. Each evening dinner was a multi-hour affair fueled by excellent conversation. Oh, and the food and drink were excellent too and were made available in very large quantity! One thing that can be said for sure about Miracle is that they don't skimp on quality or quantity!

Besides the event itself, I had the opportunity for exposure to a new place. Since it was my first trip to Denmark, I thought I'd share a few things that I noted.
1. They have some seriously cool doors there. Yep, that's right...doors. Some of the doors don't have the traditional door lock/dead bolt combination that I'm used to. Instead, the doors lock closed by lifting the handle up and the door then vacuum seals or something like that. I was too impressed by it that I didn't even bother to look closely at how the mechanism works, but it is seriously cool. I want doors like that at my house.

2. The public water closets had doors that were about 8 feet tall starting about 1 inch off the ground. OK. This may seem silly for that to be something I noticed, but being 6'4, I almost always find the top and bottom 1/3 of my body is visible in most ladies rooms. Let's just say I liked not having my head waving over the door like a giraffe for once!

3. The traffic signals not only go Green-Yellow-Red, but Red-Yellow-Green as well. That way if you're stopped, you know when the light is about to change. I'd like that in the US. I often find myself leaning to see the other side of the light so I can tell when the change is coming. New traffic signals would eliminate much neck strain on my part.

4. Danish currency is cool. Of course, practically any country outside the US has cooler money than we do. US currency is so dull. Doing currency exchange math is a bit tedious, but I pretty much figured it out: 2 million Danish kroner used to equal 1 US dollar, but now it's the other way around (given our current financial state here in the US). On the first evening of the event, the Americans were offered a collection of US dollars from our Danish friends. They gave the half dozen of us about $14 total to split amongst ourselves to help us through the current tough times. :)

5. Denmark is flat. I was told the highest points in the whole country are only about 150 and 170 meters high. Heck, that wouldn't even qualify as a bump in the road in Tennessee where I grew up! Oh, and the funny thing is that they actually call one of those places "Sky Mountain".

6. Danes are quiet, thoughtful, and reserved. Not! I've never met a more party-hearty bunch! I will say that they're fairly quiet during the day, but as dusk comes, they gear up to a whole other level. Not a dull moment! To be fair, I have to say that perhaps not all Danes are of the same genre as those attending the Miracle event, but the good folks I met sure weight the average to the fun side. :)

7. Danish hot dogs look very odd (to my American eye), but taste yummy. We stopped in downtown Copenhagen and partook of a vendor stand hot dog on the day before our departure. I didn't watch how it was made, nor look at it closely before taking a bite. It's one of those things where it's just better not to know. I found this YouTube video that after watching it makes me glad I didn't look at what I was eating!

I know there's a lot more I could add but you get the idea. I loved my trip and hope to return before too long.

Friday, October 17, 2008

Statistics musings

I've been thinking about a few problems that seem to turn up with Oracle query optimization lately. Jonathan Lewis posted on one case where the high value statistic was likely "off" at times and caused the plans to switch between good and bad. I had recently seen a similar case where a new partition was added each day and query plans that involved the new partition would pick the wrong index due to a high value problem. The stats were manually set to help fix it.

Anyway, these situations and several others I've seen have got me to wondering about whether or not using dynamic stats collections more aggressively would help some of these types of problems. I mean, if stats were missing and dynamic sampling kicked in, could/would better plans be derived more frequently?

I'm going to set up a set of tests to see where these musings take me, but I thought I'd mention it first to see if anyone has any experience with doing this and what kinds of results you've seen. If you have any input, let me hear from you!

Wednesday, October 8, 2008

None of it works!

A friend of mine sent this to me. Oh how true it is! Don't you just hate getting those cutesy emails that promise you to have eternal youth and good fortune if you only would forward the email on to 10 of your friends?

OK, I admit that I've done it once or twice. And for those of you who have been on the receiving end, please allow me to apologize. I guess there have just been moments when I really wanted what was promised to be granted to me. And if it only meant forwarding on one little email, then what's the harm, right?

Harumph! Hope may spring eternal, but there is a line between hope and not being rational.

Not that I'm against hoping and wishing for things that I want to happen, but in regard to this particular method of achieving my dreams, I think from now on I'm simply going to reply to the sender with this graphic. Then, I'll go buy a lottery ticket.

:)

Friday, October 3, 2008

Something to ponder...

Wednesday, October 1, 2008

The courage to learn

Cary Millsap recently had a great post on the use of active vs. passive voice when writing. Check it out, it's an excellent read.

Anyway, Cary made a reply to one of the comments he received that has sparked much thought on my part. His statement was:

Learning is a state transition between not knowing something and knowing it. For someone to truly learn, he has to have the courage to admit—at least to himself—that there's something he doesn't know. Some people are unwilling to do that.

Think about that for a second. What do you think? Is the capacity to learn rooted in the ability to simply admit there is something you don't know?

I think Cary's spot on with that assessment. I do believe that fear of admitting what you don't know holds you back from learning/knowing what you need to. It's like having blinders on. I prefer to believe it is fear based and not arrogance, but I think arrogance is at the heart of many people's unwillingness to seek out knowledge. I mean, what if they're proved wrong about something they've stated as truth? If you think yourself an "expert" and your ego can't stand the hit it would take if you are proven wrong, aren't you destined to be found lacking knowledge you claim to possess at some point?

Although I can't find the exact quote, I remember reading something to the effect of "A wise man knows he doesn't know everything." For me, my personal goal is to be continually in the process of learning. I love to learn. Sometimes the process is painful or embarrassing when I state something I believe to be fact and am shown that my information is either lacking or just plain wrong. But, even then, I'm glad to learn. Without coming under critique, how do I confirm what I claim I know? In the end, it's empowering for me...and humbling too. I like knowing and I like sharing what I know. But in order to know, I have to first learn. And I think that learning is the real thrill for me. Sharing what I learn is just icing on the cake.

I think it's easy to get complacent and not seek out new knowledge. Many times I see people who do basically the same job and see the same issues day after day, month after month. It's easy to get lazy. It's easy to turn around and your company is considering implementing Oracle 11 and you've never bothered to learn anything more than you knew when you learned back in Oracle 7. It's easy to assume that once you know something, that knowledge stays fixed and there's no need to "upgrade" that knowledge.

For me, I do what I do because not only do I like to mentor and teach, but I primarily love to learn. My first job was in the government sector. I was the new, young kid working with a bunch of guys who had been at their jobs for 20-30 years. When I finished projects ahead of my allotted time, they'd chide me. They told me I needed to learn to "take it easy" and that I "wasn't in school anymore" and didn't need to complete my assignments in record time or try to impress my professors. I just needed to sit back and relax and learn to "not tax my brain". I tried (for a very short while). But, I stagnated. I couldn't stand sitting still. I couldn't stand not learning and not doing more than just what I knew. I wanted more. I wanted challenge. I wanted to expand what I knew so I could do my job better and more efficiently.

In the end, I stayed in my first job for less than a year. And, I know now as I look back, it was the simple fact that I wasn't learning that prompted me to leave. I grew bored and uninterested in my job and I just couldn't stand it.

It takes courage to be a learner. It requires effort. It requires being willing to admit there's a lot I don't know. But I know for me there's no other choice. Learning is like a drug and I'm an addict.

Monday, September 22, 2008

STAT lines in 11g

My friend and colleague, Ric Van Dyke, has been preparing a presentation for the UKOUG conference and in his work discovered an anomaly in the roll ups for the time values in the STAT lines emitted in 10046 trace data from an 11g database. I had seen this behavior as well but didn't follow up on it for myself since I wasn't using Oracle 11 for anything other than a bit of testing here and there.

A couple of weeks ago, I received an email from Rudi K. as a follow up to my post on Plan Execution Statistics. In some of Rudi's own testing, he came across this anomaly in the rowsource execution statistics as well and asked me about it. So, given Rudi's question and a recent discussion with Ric on the subject, I thought I'd blog about it. Rudi - I apologize it's taken me so long to properly respond to you, but here you go! :)

First, let's look at an example of what the problem looks like. Take a look at the rowsource execution statistics (from the trace file STAT lines) for the following very simple query in Oracle 10g:


select /*+ use_merge(e,d) */ * 
  from emp e, dept d
 where e.deptno = d.deptno ;

STAT #3 id=1 cnt=13 pid=0 pos=1 obj=0 op='MERGE JOIN  (cr=11 pr=8 pw=0 time=21096 us)'
STAT #3 id=2 cnt=4 pid=1 pos=1 obj=54389 op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=2 pw=0 time=17342 us)'
STAT #3 id=3 cnt=4 pid=2 pos=1 obj=54390 op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=1 pw=0 time=16060 us)'
STAT #3 id=4 cnt=13 pid=1 pos=2 obj=0 op='SORT JOIN (cr=7 pr=6 pw=0 time=3539 us)'
STAT #3 id=5 cnt=13 pid=4 pos=1 obj=54391 op='TABLE ACCESS FULL EMP (cr=7 pr=6 pw=0 time=3284 us)'

In order to understand the rowsource execution statistics (the information in the parentheses at the end of each line), you need to follow one simple "rule": the values listed in a child operation roll up into their parents. The bottom-line is that the top line should be the roll up of the child operations listed below it. To understand the relationships, take a look at this tree diagram of this query's execution plan.

So, you could break down the rowsource time stats as follows (I'm only showing the time stats as these are the roll ups that appear to have problems in 11g but all the rowsource statistics roll up the same way):


-------------------------------------------------------------------------------------
| Id  | Pid | Operation                    | Name           | Time  | Self  | Kids  
-------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT             |                | 21096 |       |       
|   1 |  0  |  MERGE JOIN                  |                | 21096 |   215 | 17342+3539 
|   2 |  1  |   TABLE ACCESS BY INDEX ROWID| DEPT           | 17342 |  1282 | 16060 
|   3 |  2  |    INDEX FULL SCAN           | DEPT_DEPTNO_PK | 16060 | 16060 |     0 
|   4 |  1  |   SORT JOIN                  |                |  3539 |   255 |  3284 
|   5 |  4  |    TABLE ACCESS FULL         | EMP            |  3284 |  3284 |     0 
-------------------------------------------------------------------------------------

So far, so good. Now, let's look at the same info from 11g.


STAT #3 id=1 cnt=13 pid=0 pos=1 obj=0 op='MERGE JOIN  (cr=6 pr=3 pw=3 time=166 us cost=5 size=767 card=13)'
STAT #3 id=2 cnt=4 pid=1 pos=1 obj=71193 op='TABLE ACCESS BY INDEX ROWID DEPT (cr=4 pr=2 pw=2 time=53 us cost=2 size=80 card=4)'
STAT #3 id=3 cnt=4 pid=2 pos=1 obj=71194 op='INDEX FULL SCAN DEPT_DEPTNO_PK (cr=2 pr=1 pw=1 time=36 us cost=1 size=0 card=4)'
STAT #3 id=4 cnt=13 pid=1 pos=2 obj=0 op='SORT JOIN (cr=2 pr=1 pw=1 time=25 us cost=3 size=507 card=13)'
STAT #3 id=5 cnt=13 pid=4 pos=1 obj=71195 op='TABLE ACCESS BY INDEX ROWID EMP (cr=2 pr=1 pw=1 time=84 us cost=2 size=507 card=13)'
STAT #3 id=6 cnt=13 pid=5 pos=1 obj=71200 op='INDEX FULL SCAN EMP_DEPT_IDX (cr=1 pr=1 pw=1 time=29 us cost=1 size=0 card=13)'

The plan is a bit different in that it used an INDEX FULL SCAN for the EMP table access but otherwise, it's basically the same.

Here's the break down of the rowsource time stats:


--------------------------------------------------------------------------------------
| Id  | Pid | Operation                     | Name           | Time  | Self  | Kids  
--------------------------------------------------------------------------------------
|   0 |     | SELECT STATEMENT              |                |   166 |       |       
|   1 |  0  |  MERGE JOIN                   |                |   166 |    ?? | 53+??
|   2 |  1  |   TABLE ACCESS BY INDEX ROWID | DEPT           |    53 |    17 |    36 
|   3 |  2  |    INDEX FULL SCAN            | DEPT_DEPTNO_PK |    36 |    36 |     0 
|   4 |  1  |   SORT JOIN                   |                |    25 |    ?? |    ??   
|   5 |  4  |    TABLE ACCESS BY INDEX ROWID| EMP            |    84 |    55 |    29 
|   6 |  5  |     INDEX FULL SCAN           | EMP_DEPT_IDX   |    29 |    29 |     0
--------------------------------------------------------------------------------------

Hmmmm... Something's wrong here. Notice on step 4, the SORT JOIN operation's total time is only 25 microseconds. How can that be possible if it's child operation takes 84 microseconds? The MERGE JOIN step shows a time of 166 microseconds. How can I be sure of that total since step 4 appears wrong (i.e. it doesn't appear to include it's child operation totals)?

Well, if we go with Ric's theory, the problem is with the SORT step (id = 4). It appears that Oracle is no longer rolling up totals when a SORT type of operation occurs. Instead, it maintains that operation's statistics independent of its children. If that's the case, then to get the correct timing roll up, we'd need to add the time from steps 2, 4 and 5. That would be 162 microseconds. That would mean that the MERGE step would take 4 microseconds.

I'm not sure that's really how it's working, but the math seems to work out fairly close. Ric says:

What appears to be happening is that the SORT is now done "while" the child steps are accruing. I think the best way to describe this new behavior is that sorts now happen simultaneously with the child steps. I don't believe that this is a parallel action, it is an action that happens simultaneously. There is a subtle but distinct difference.

Regardless of if there's a resolution or not (perhaps we need to change the way the roll ups are done for SORT type operations which include not only SORT-MERGE joins, but queries that have ORDER BY and HASH joins as well), we do need to take note of the change. I've depended on these rowsource stats quite frequently when optimizing SQL and would love to really understand why this behavior is occurring and how to account for it in trace data.

If anyone has anything more concrete on this, I'd love to hear from you and I'm sure Ric would too!

Miracle Open World 2008

I'll be presenting at this year's Miracle Open World in Denmark and was asked to prepare a video about my presentation 'Are you an astronaut or a monkey? The Oracle Advisors from a different perspective'. I sure had fun making it. I have a feeling MOOW will be even more fun!

Sunday, September 21, 2008

Where in the world is Karen?

Sorry to be absent for so long. I've been teaching all day for the past two weeks and have one more week to go before I catch a bit of a break. It's been great, but it doesn't leave me with much energy or brain function to post in the evenings. :)

So, in lieu of something more profound, follow the visuals to know where I've been hanging out.

Canada

Ontario

Near Toronto which is near this gorgeous landmark

In the city of Waterloo

I'm having a terrific visit, eh. :)

Friday, September 5, 2008

Teacher's Day

Today is Teachers' day in India. Every year on this day I receive an email from a student of mine who lives in India sending his thanks and also to ask for my "teacher's blessing" for him and his endeavors. This has been going on for about 4 years now and I look forward to it.

In India September 5th is celebrated as Teachers' day as a mark of tribute to the contribution made by teachers to the society. This date is the birthday of a great teacher, Dr. Sarvapalli Radhakrishnan, who was a staunch believer of education, and was the well-known diplomat, scholar, president of India (elected in 1962) and above all a teacher. You can read more about Teachers' day here.

I love to teach. It's truly a passion. I think it is true that teachers are also the most frequent and enthusiastic students. I know I am. I love to learn. And, even when I am teaching, I'm learning. My students often impart as much helpful information to me as I do to them. How great is that?!?!

In any case, I wanted to take a moment to thank my teachers...all of you. I know my life has been shaped by all the wonderful teachers I have had.

I got to speak to one of my most favorite teachers just last week after more than 25 years. It happened out of the blue when my sister, who is a physical therapist and lives and works in Tennessee, called me during the day and said "I've got someone here for you to say hi to." She then handed the phone to someone who asked if I remembered her. Obviously, I was clueless and so she gave me a hint: "I had you in class for two years in a row." Ah-ha! It was Mrs. Crossno! She was my Latin teacher during my freshman and sophomore years in high school. Wow. As it turns out she was visiting my sister's office for some physical therapy and when she asked about me, Amy just called me up and handed her the phone.

It was very, very sweet to speak to her again after all these years. She retired just a couple of years ago but had remembered me all these years and even kept a little cartoon I drew way back then posted on the bulletin board in her classroom. It was a copy of a Peanuts cartoon showing Charlie Brown with his little sister Sally. Sally was resisting the idea of going to kindergarten and the caption read 'Well, all right, I'll go - but I'm not going to learn Latin.' Mrs. Crossno had laminated it and kept it in her classroom all those years.

She made quite an impact on me that's for sure. Two years of Latin, a "dead" language, equipped me with knowledge I used through college and still find skills I learned then coming in handy now. She taught us etymology (the study of word origins) and from that, I have been able to break many words down into their Latin roots so that I could get at least some understanding of their meaning when the word was new to me. And, it wasn't just what she taught me, it was how she did it. She made it fun. She was encouraging and supportive. She knew her stuff but never made you feel stupid for stumbling. She was an inspiration and gave so much of herself that it made you want to give back all you could.

So, today on this Teachers' day, I want to thank Mrs. Crossno and all my teachers throughout the years. You have blessed me in so many ways.

Thursday, September 4, 2008

I'm truly honored

I received an email this past week with the following (partial) content:

I am pleased to inform you that you have been nominated for the Oracle ACE award. You have been chosen based on your significant contribution and activity in the Oracle technical community. Like your fellow Oracle ACEs, you have demonstrated a proficiency in Oracle technology as well as a willingness to share your knowledge and experiences with the community.

I am pleased and very honored to join the ranks of Oracle ACEs and will make it my continuing goal to be deserving of it.

Monday, September 1, 2008

Too big?

What do you think?

Did I make the wrong choice when I bought the super jumbo size tub of "I Can't Believe It's Not Butter"?

Let me help you visualize the size a bit better. Here's the tub next to a cream cheese container that is the same size (8 oz) as my previous size butter tub.

I'm thinking that my logic was a bit flawed when I decided on the 45 ounce size. I thought it was smart. The super jumbo tub was on sale and only cost 50 cents more than the small size. It seemed to me to be an easy choice. The big tub would equate to over 5 of the smaller size and cost only 50 cents more. Sounds like a great deal, right?

However, the problem with my choice became evident when I got home with my purchase. Where was this gigantic tub o' butter going to fit in my refrigerator? I ended up having to rearrange an entire shelf to get the thing to fit. Of course, the arrangement I came up with meant that if/when I wanted to get out the butter, I'd have to get out several other items just to be able to pull the big guy from its spot. Sigh...

Sometimes I suppose you just shouldn't go for what seems to be the obviously better deal, huh?

Friday, August 29, 2008

slide:ology

My latest read is a terrific book called slide:ology by Nancy Duarte. There is also a site by the same name with lots of on-going material related to the book which is filled with content from the folks at Duarte Design. This book is going to be right next to my copy of Presentation Zen by Garr Reynolds that I posted about back in May.

I'm loving this book so far. It has some great stuff that continues to feed my desire to be a better, more effective and compelling speaker. I've become fascinated by the use of design principles to guide my presentation designs. I've had an opportunity to try out my new presentation style a couple of times now and I find the audience is more engaged and asks more questions and provides more comments and feedback at the end.

So, if you're tired of the same old boring PowerPoint presentation design and delivery that we're all so used to (and tired of), give this book (and Presentation Zen) a read. It'll change you...

Wednesday, August 27, 2008

What are you doing?

One of my favorites stories goes like this:

A man came upon a construction site where three people were working. He asked the first, What are you doing? and the man answered, I am laying bricks.

He asked the second, What are you doing? and the man answered, I am building a wall.

He walked up to the third man, who was humming a tune as he worked and asked, What are you doing? and the man stood up and smiled and said, I am building a cathedral.

In my work, I'm like the third guy. I wake up every day smiling because in my own way, I get to build a cathedral. It's about the bigger picture, the higher view, the life choices that have brought me to a place where every day I get to do what I truly love to do.

How about you? What are you doing?

Tuesday, August 19, 2008

Making your case

I am surprised at how often I've been on site with a customer and find out that the recommendations I end up making were actually already suggested by the DBAs and/or developers on staff. I wonder why the suggestions of the employees aren't heeded and a consultant gets called in instead? It would seem that the company could have more effectively used the money and time spent to have a consultant tell them the same thing their employees are already saying.

So, how does the staff DBA or developer get heard? It may not even be that what they have to say is set aside in lieu of a consultant's opinion. It may be a case where a DBA or developer suggests doing something that hasn't been done before and they can't get buy in from their managers and colleagues. Why?

It's an interesting issue. My experience has been that many times I will be able to make recommendations or solve problems that on site employees hadn't been able to work out for themselves either due to lack of experience or lack of time to devote to solving the problem. Fortunately, I don't think I've ever left behind recommendations that failed and cost the company I intended to help, but I can't even count the number of times that my recommendations mirrored those that one or more employees had already made.

So, how do you make your case? How do you get heard? Do you quit and contract back in as a consultant making 4 times more money? I've seen that very thing happen, haven't you? Do you offer to quit if your recommendation fails? Hardly.

I think there are a few things that can help anyone make a case for their suggestions/recommendations:

Outline the business case or situation in detail and note the business goal. Keep in mind that a business case is really a tool that looks at financial considerations. No matter what other elements may be viewed as relevant to the decision, if they can't be expressed in terms of their financial impact they'll have no influence on the results of your case.

Provide meaningful options to accomplish the task/project. If the question to be answered can only have a "yes" or "no" option, it's easy to point to the winner. However, when you need to ask "which way" something will be accomplished, it is important to provide the options to allow the reviewers to step back and review options that reflect the most relevant means of accomplishing the business' goals.

State the full costs of the task/project and any on-going support and of transition. This isn't just about the cost of doing the project, but costs of supporting it once it's over. Things like licensing costs, staffing costs and costs of ongoing problems the project didn't intend to fix (perhaps only postpone). The full cost isn't just the cost to do the project, but is the full cost to the business of making a specific decision.

Show both the good and bad impacts of change. It is possible that along with the positive consequences of the project, there may be some negative consequences as well. This could include positions needing re-defined, transition costs and ongoing process problems that the project won't resolve.

Ensure the benefits of your recommendations are attainable...and can be measured. Be able and prepared to show before and after comparisons.

Know what metrics play a role in the decision-making process (and how). There are a host of metrics (some concrete and some fuzzy) that are considered when any decision must be made. And many times there is no collectively agreed upon way of measuring them. The key is understanding what your audience is looking for in terms of metrics and what they are going to make a decision based on. You need to know what they're looking for, what thresholds are relevant to them, and how they are used to seeing these metrics presented and lay them out that way.

Tell a story. The bottom-line is that your case is made by answering two fundamental questions: "Should we do this?" and "Which way is most effective to accomplish it?" Write up your recommendations and plans as if you are telling a story that answers those questions. Establish the background, introduce the problem and define why it is important to solve the problem. If you don't solve the problem, what is the consequence? What are the various way sthat you could solve the problem, and what will happen to the organization in each instance? And based on all of that, what is the conclusion your want your audience to draw and the action you recommend taking? Make sure you tell the story first, and then fit it into the template of your case.

Everyone can learn to make a case strongly and effectively. You can make a case that is polished and persuasive. In time, results will back up your proposals and consultants can be used as they should be and not just to confirm what full-time employees already know.

Friday, August 15, 2008

Strange parsing behavior

My friend Jan is cool. He sends me lots of neat SQL performance issues he's working on so I canto take a look at. I love it when I get to see performance problems get fixed and have the before/after data to show the differences. Like I said, Jan is cool and sends me lots of good stuff like this.

What he sent me yesterday was interesting because I'd never seen it before. The query that was having problems was a UNION of 30 or so different SELECTs. Many of the SELECTs were similar except for the WHERE clause. Here's what Jan said in his email to me:

[Jan]
Now the fun thing is to follow it during execution. This seems to be happening:

-- parse whole query => v$sql shows sql_id child_number 0
-- start parallel slaves for
-- parallel slaves do work
-- when done, parallel slaves for this part of query wait: 'PX Deq: Execution Msg'
...
-- start new set of parallel slaves for
-- slaves wait 'cursor: pin S wait on X'
-- Query coordinator shows wait: 'PX Deq: Parse Reply' => v$sql shows sql_id child_number 1
-- parallel slaves do work
-- when done, parallel slaves for this part of query wait: 'PX Deq: Execution Msg'
...
-- start new set of parallel slaves for
-- slaves wait 'cursor: pin S wait on X'
-- Query coordinator shows wait: 'PX Deq: Parse Reply' => v$sql shows sql_id child_number 2
-- parallel slaves do work
-- when done, parallel slaves for this part of query wait: 'PX Deq: Execution Msg'
etc, etc

By the time the whole query is done we have child_number 0..30, and about 360 sessions because of all the parallel slaves!

Weirdest part is the 'cursor: pin S wait on X' plus 'PX Deq: Parse Reply' waits, they take > 8 seconds every time! Why would it parse over and over again???

When I start the query a second time, it's way faster, since those parsing waits don't happen (if the plan's still there):
First time: Selected 109 records in 712.485 Secs
Second time: Selected 109 records in 142.829 Secs
[/Jan]

The two things about this that interested me where the cursor: pin S wait on X events and the parsing waits. I had just seen the 'pin S' timed event the day before while reviewing a trace file with another customer. I find it funny how I'll often see 2 or 3 of the same kind of error in a short time frame. I guess it really is true that things happen in multiples, huh? Anyway, the 'pin S' events are related to mutex pinning issues. Admittedly, I didn't know much about that until the issue I researched the day before. Here's what I learned:

The timed event for 'cursor: pin S wait on X' (I'm just going to call this pinS for short from now on) is an event that started in occurring in v10 when Oracle added mutex pins into the scheme in addition to latches. Basically, a mutex is kind of like a latch, but just a simpler, faster version. When you get the pinS wait, it's caused by a session holding an exclusive mutex pin on a cursor for which another session wishes to get a shared mutex pin on the same cursor object. From the looks of things with your big UNION query and all the parallel slave processes, that makes sense. The p1-p3 values give you some info:

P1 Hash value of cursor
P2 Mutex value (top 2 bytes contains SID holding mutex in exclusive mode, and bottom two bytes usually hold the value 0)
P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps

and there are a few dynamic system views that provide more information about this timed event: V$MUTEX_SLEEP and V$MUTEX_SLEEP_HISTORY. You just have to check those views while the problem is occurring to get the additional info.

There's not a whole lot of data on this floating around out there, but I found some interesting info from Tanel Poder and Jonathan Lewis that helped.

Anyway, the second issue was the PX Deq: Parse Reply waits. I was wondering why all the parsing was going on and guessing that if the parsing issue was resolved, that would also reduce or eliminate the pinS waits as well.

I wrote back to Jan about the small bit I knew and told him I thought that rewriting the query to combine several of the separate SELECTs together would likely help the problem. At a quick glance, there were 18 of the SELECTs that shared the same FROM clause and join conditions.

Today I did a bit more research to check out the 'PX Deq: Parse Reply' and came across a great write up at Doug Burn's blog about Parallel-specific wait events (the whole article starts here). Good stuff...check it out.

Then, just a few minutes ago, Jan sent me the update on how he was progressing.

[Jan]
After some more digging this is where we get multiple children (simplified case, I'm sure you can simulate it):

 
select /*+ test */ appln_jrnl_id, sum(payment_count) from fihub.tbl_voucher_acct_line  d
where appln_jrnl_id   = 'AP_PAYMENT'
and   fiscal_year_num = 2008
and   acct_period_cd  = 5
group by appln_jrnl_id
union
select /*+ test */ 'AP_PAYMENT', sum(payment_count) from fihub.tbl_voucher_acct_line  d
where appln_jrnl_id   = 'AP_PAYMENT'
and   fiscal_year_num = 2008
and   acct_period_cd  = 5;

BTW, this table is partitioned, so it's using parallelism.

The original query had mostly group by's. But there where a few SUMs without group by's. That's what is causing the trouble. v$sql_shared_cursor says it's because of a PQ_SLAVE_MISMATCH. When running serially it doesn't happen. After rewriting the troubling parts to do a group by, it's back to normal. After a total rewrite, the query is much faster.
[/Jan]

Again, I love stuff like this. I thought it would be an interesting case to share.

Thanks Jan!

Thursday, August 14, 2008

Mac vs. PC

I saw a post recently that read as follows:

Subject: A bit of Mac vs PC hilarity

A co-worker returned recently from a developers class for some aspect of .Net. He found it quite funny when he discovered some MS employees taking the class, running Vista in a VM.

The host machines: MacBook Pro's

My MacBook Pro is also running Vista in a VM, but I rarely touch it! But, the fact that MS employees are using Macs with VM for Vista says it all if you ask me. All hail the Mac!

Friday, August 8, 2008

Pah-tay-toe, Pah-tah-toe

After a brief hiatus, I figured I'd better blog a little something to let everyone know I was still out here. So, here you go...

I was reading something recently that caught my fancy in that "this is weird and kinda funny" way. It went something like this:

If deer meat is called venison and cow meat is called steak, why is chicken just called chicken?

How did the chicken miss out? Somehow if I went to a nice restaurant and the menu offered me Prime Cut of Cow instead of NY Strip, I don't think I'd find it quite so appealing. But, when I see Grilled Chicken on the menu, I'm completely fine with that. Hmmmm....

Hopefully next week I'll have something more interesting to say, but for now, I'll leave you to ponder. :)

Monday, July 28, 2008

Wordle

I was directed to a cool little site called Wordle that takes text, the URL of any page that has an Atom or RSS feed, or a del.icio.us user name and turns it into a cool little graphic.

I tried it twice: once with my blog and once with the text of my Method R bio page. The one from my blog needs to have it's "nerdiness factor" toned down (this feedback according to my colleague Ron...and I agree by the way). I think the one from my bio page is pretty cool. Here they both are so you can see what you think.

From my blog:

From my bio page text:

Wednesday, July 23, 2008

Just for fun

I occasionally see something called a meme on other blogs I read and recently saw one that I thought looked like fun so I thought I'd try it.

Here's how it works:

Using BigHugeLabs Mosaic maker to make a mosaic and Flickr to do your searches, type your answer to each of the questions below into Flickr Search. Then, using only the FIRST page, pick an image. Copy and paste each of the URLs for the images into the mosaic maker.

The Questions:
1. What is your first name?
2. What is your favorite food?
3. What high school did you go to?
4. What is your favorite color?
5. Who is your celebrity crush?
6. Favorite drink?
7. Dream vacation?
8. Favorite dessert?
9. What you want to be when you grow up?
10. What do you love most in life?
11. One word to describe you.
12. Your flickr name.

You never know what may turn up on that first Flickr page. Can you guess what some of my answers were? They may not be what you think. :)

Tuesday, July 22, 2008

Optimizer won't choose Index Fast Full Scan?

A while back, I was asked to take a look at a situation where a COUNT(*) query was using full table scan instead of an Index Fast Full Scan. The answer turned out to be fairly simple. Can you determine the problem?

Table with about 480K rows and about 800 Mb in size.

It has the following indexes:


SQL> @show_table_indexes
Enter value for owner: aradmin70
Enter value for table: t1478

                               Column                          Column
INDEX_NAME                     Name                           Position
------------------------------ ------------------------------ --------
I1478_3_1                      C3                                    1
I1478_600000214_1              C600000214                            1
I1478_600000215_1              C600000215                            1
I1478_740002011_1              C740002011                            1
I1478_740002012_1              C740002012                            1
I1478_770000000_1              C770000000                            1
I1478_840000044_1              C840000044                            1
IT1478                         C1                                    1

When executing:


select C600000215, count(*)
  from aradmin70.T1478
 group by C600000215;

It uses the following plan:


Execution Plan
----------------------------------------------------------
   0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=27954 Card=563 Bytes=11823)
   1    0  HASH (GROUP BY) (Cost=27954 Card=563 Bytes=11823)
   2    1    TABLE ACCESS (FULL) OF 'T1478' (TABLE) (Cost=27922 Card=481061 Bytes=10102281)

Even using a hint, the optimizer still goes for a full scan.


select /*+ INDEX_FFS(tab1) */ C600000215, count(*)
  from aradmin70.T1478 tab1
 group by C600000215;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=27954 Card=563 Bytes=11823)
   1    0   HASH (GROUP BY) (Cost=27954 Card=563 Bytes=11823)
   2    1     TABLE ACCESS (FULL) OF 'T1478' (TABLE) (Cost=27922 Card=481061 Bytes=10102281)

The index should obviously be a better choice:


select count(*) "Extents",
       round(sum(bytes)/(1024*1024)) "MBytes"
  from dba_extents
 where owner='ARADMIN70'
   and segment_name='T1478';

   Extents     MBytes
---------- ----------
       172        807

For the index:


   Extents     MBytes
---------- ----------
        32         17

What should be checked?

...and the answer is...

You should check to see if the indexed column allows nulls. If the column allows nulls, then the index can't be used to count the nulls since rows with null values are not stored for single column indexes.

A simple describe showed that the column C600000215 which was used as the index column was a VARCHAR2(100) and it did not have a NOT NULL constraint. So, the optimizer couldn't choose the IFFS.

Here's the fix that was implemented to allow the IFFS to be chosen:


alter table ARADMIN70.T1478
  modify (C600000215 NOT NULL);

This added the NOT NULL constraint to the column. Now, when the query is executed, the optimizer chooses the IFFS.


select C600000215, count(*)
  from aradmin70.T1478
 group by C600000215
 order by C600000215;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1781 Card=5340 Bytes=112140)
   1    0   SORT (GROUP BY) (Cost=1781 Card=5340 Bytes=112140)
   2    1     INDEX (FAST FULL SCAN) OF 'I1478_600000215_1' (INDEX) (Cost=605 Card=494913 Bytes=10393173)

Of course, if the column had to be allowed to have nulls then the solution would've required some more thought. But fortunately, this one worked fine.

Monday, July 21, 2008

Code formatting/Syntax Highlighting

I've been trying for a while to find a way to get a code formatting utility to work on my blog. I'd originally seen it in WordPress where there's a built in tag to handle code formatting. But, no such thing exists natively for Blogger.

I found a Google Code utility that I had tried to get added without much success but today, my colleague Ron got it working for us to use in our new forum (coming soon!). Now, with a couple of additions to my Blogger template, I'm able to use a slightly modified pre tag to get my code blocks formatted quite nicely.

All I have to do is enclose my code like this:


  <pre name="code" class="sql">
  </pre>

and I'm all set. To get this example to show up, I had to use "& lt" instead of the actual pre start and end tags (using <). That's good to know as it allows you to also display html as formatted code text and not be interpretted.

Cool stuff. The piece that I needed to get all this working was related to javascript, so when Ron got that part handled, I was in business!

Check it out if you're looking for a way to format your code in a nice, clean way.

Friday, July 18, 2008

Anonymous

I've been watching the continuing comments related to Tom Kyte's recent post which referenced Cary's recent post on Christian Antognini's new book. There are a couple of things that fascinate me about the whole exchange: 1) all the comments about Cary are happening over at Tom's blog and 2) the person who is firing at Cary is "Anonymous".

To the first point, it really just cracks me up that Tom's very short and simple post about liking something Cary wrote for his foreward to Chris's book has received all the comments. Why, I wonder, didn't "Anonymous" make comments on Cary's blog since it was Cary's words that he took exception to? Odd...

To the second point, I have to rant a bit on the whole idea of anonymity. First let me say that I don't like it. In the context of posting to public forums or blogs or where ever a person chooses to share their opinions and comments, it seems unnecessarily covert to not identify who you are. I'm interested to know what the fear is that makes someone not want to identify themselves. Is it fear of reprisal? Is it fear of looking "dumb"? What?

It reminds me of a couple of quotes:

"I disapprove of what you say, but I will defend to the death your right to say it."
S. G. Tallentyre

"If you aren't going to say something directly to someone's face, then don't use online as an opportunity to say it. It is this sense of bravery that people get when they are anonymous that gives the blogosphere a bad reputation."
Mena Trott

I think there is a place for anonymous commentary. Many times people won't tell you the full truth because they actually like you and think it may hurt your feelings or something like that. But they feel OK to share when they know they won't be identified.

One of the best personal review techniques I've ever heard of is the 360 degree feedback process. It is an review process that involves having co-workers, managers, and as many people as possible involved in providing feedback to a single individual. All their feedback is done anonymously. Each person that participates fills out detailed questionnaires about the person and the results are compiled to provide a very specific report back to the employee about their performance as other people see it. Think about it. If you were asked to talk about your manager (remember, this is the person who does your review and grants you raises and approves your time off requests) many people are less than totally forthcoming when answering questions if they think their feedback will be known to the boss. But, when they are assured their anonymity, they feel free to truly speak their minds.

OK. I get that. And, I'm sure I could find other instances when anonymity can serve a very good purpose. But, I don't see how remaining anonymous when you make a blog post or comment is valid. There's another quote that I heard recently but don't remember it exactly that goes something like, if you wouldn't sign your name to something you say, then don't say it. I think that is right on.

We live in a country where we are free to express our opinions. Only truly inflammatory speech is prohibited. But, even then, you've got to really be over the top for anything you say to be legally actionable.

So, why not own what you have to say? If you're not willing to own it, then just plain don't say it. Lurk on the blogs and keep your opinions to yourself. I may not like what you say. If I don't like it, I can add my voice to the conversation in opposition. But in order to engage in true, meaningful, open debate and discussion, I truly don't see how remaining anonymous serves that purpose.

From the start, an anonymous poster is (at least by me) viewed with suspicion. For the most part, I either ignore or gloss over their comments. But, as I believe so often happens, many people who post anonymously tend to have a recognizable pattern in how they write. So, after a while, the anonymous person is actually recognized...even if it's only a recognition that it's the same person's "voice". Sometimes the anonymous voice can be cross-matched to an actual person from places where they have chosen to identify themselves. Writing style is similar to a fingerprint and can often be used to identify the writer even when they post without identifying themselves.

But it is hard to completely ignore the anonymous commenter. When they have things to say that you don't understand or that you disagree with, you want to respond. But, it just feels "off" to respond to a "nobody".

I do support anyone's right to say things they think/believe even if it is in opposition to my own thoughts/beliefs. I just wish people would own their comments with their real identities. I think it speaks to strong character and intent when you are willing to identify yourself.

I recently read something about how your name is your brand. If you hide your real name behind an anonymous label or even some made up screen name (for example, JoeCoolDBA or SmartGuy), you are not presenting your brand and that creates confusion and waters down your name.

So don’t be tempted to bury your own beautiful name. Love it. Own it. Flaunt it.

With that said, this is Karen Morton, signing off for today.

Thursday, July 17, 2008

Analytics to the rescue - followup

I thought I'd do a follow-up post to yesterday's "Analytics to the rescue" post as the discussion in the comments brought up a couple of things that needed more info than could be easily included in the comment section. So...

What you'll see below is a more complete text of the tests I executed that were discussed in yesterday's post. First, the original query, then for the rewritten query. What I show for each is the actual STAT lines from the 10046 trace data, the EXPLAIN PLAN and some additional stats (collected with a SQL test harness set of scripts I use).

Test for original query


select utrsotp_desc
  from utrsotp, ucbsvco
 where ucbsvco_sotp_code = utrsotp_code
   and ucbsvco_dispatch_date =
                                (select max(ucbsvco_dispatch_date)
                                   from ucbsvco
                                  where ucbsvco_cust_code = 1320908
                                    and ucbsvco_prem_code = '507601'
                                    and ucbsvco_stus_code = 'C'
                                    and ucbsvco_dispatch_ind is not null)
   and ucbsvco_stus_code = 'C'
   and ucbsvco_dispatch_ind is not null


STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=106693 r=59924 w=0 time=19518419 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=74130 op='TABLE ACCESS FULL UCBSVCO (cr=106691 r=59924 w=0 time=19518381 us)'
STAT #3 id=3 cnt=1 pid=2 pos=2 obj=0 op='SORT AGGREGATE (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=138 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=45 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=22 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=12 us)'


Explain Plan
--------------------------------------------------------------------------------------
| Id  | Operation                      |  Name               | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     |     1 |    38 |  1768 |
|   1 |  NESTED LOOPS                  |                     |     1 |    38 |  1768 |
|*  2 |   TABLE ACCESS FULL            | UCBSVCO             |     1 |    12 |  1767 |
|   3 |    SORT AGGREGATE              |                     |     1 |    20 |       |
|*  4 |     TABLE ACCESS BY INDEX ROWID| UCBSVCO             |     1 |    20 |     1 |
|*  5 |      INDEX RANGE SCAN          | UCBSVCO_CUST_INDEX  |     8 |       |     1 |
|   6 |   TABLE ACCESS BY INDEX ROWID  | UTRSOTP             |     1 |    26 |     1 |
|*  7 |    INDEX UNIQUE SCAN           | UTRSOTP_KEY_INDEX   |     1 |       |       |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND"
              IS NOT NULL AND "UCBSVCO"."UCBSVCO_DISPATCH_DATE"= (SELECT /*+ */
              MAX("UCBSVCO"."UCBSVCO_DISPATCH_DATE") FROM "UIMSMGR"."UCBSVCO" "UCBSVCO" WHERE
              "UCBSVCO"."UCBSVCO_CUST_CODE"=1320908 AND "UCBSVCO"."UCBSVCO_PREM_CODE"='507601' AND
              "UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL))
   4 - filter("UCBSVCO"."UCBSVCO_PREM_CODE"='507601' AND
              "UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL)
   5 - access("UCBSVCO"."UCBSVCO_CUST_CODE"=1320908)
   7 - access("UCBSVCO"."UCBSVCO_SOTP_CODE"="UTRSOTP"."UTRSOTP_CODE")


Statistics Snapshot

Type  Statistic Name                                    Value
----- ---------------------------------------- --------------
Latch cache buffers chains                          2,142,103
      library cache                                   116,725
      row cache objects                                19,592

Stats buffer is pinned count                                8
      consistent gets                                 106,693
      db block gets                                        56
      execute count                                        16
      parse count (hard)                                    1
      physical reads                                   59,924
      physical writes                                       0
      redo size                                         9,524
      session logical reads                           106,749
      session pga memory                              594,008
      session pga memory max                                0
      sorts (disk)                                          0
      sorts (memory)                                        0
      sorts (rows)                                          0
      table fetch by rowid                                 10
      table scan blocks gotten                        106,669

Time  elapsed time (centiseconds)                       2,016

Test for rewritten query


select utrsotp_desc
  from utrsotp,
                (
                select ucbsvco_sotp_code sotp_cd, ucbsvco_dispatch_date dispatch_dt,
                           max(ucbsvco_dispatch_date) over (partition by ucbsvco_cust_code, ucbsvco_prem_code) max_dispatch_dt
                  from ucbsvco
                 where ucbsvco_cust_code = 1320908
                   and ucbsvco_prem_code = '507601'
                   and ucbsvco_stus_code = 'C'
                   and ucbsvco_dispatch_ind is not null
                ) svco
 where svco.dispatch_dt = svco.max_dispatch_dt
   and svco.sotp_cd = utrsotp_code


STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=15 r=0 w=0 time=249 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=0 op='VIEW  (cr=13 r=0 w=0 time=228 us)'
STAT #3 id=3 cnt=2 pid=2 pos=1 obj=0 op='WINDOW BUFFER (cr=13 r=0 w=0 time=218 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=37 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=16 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=9 us)'


Explain Plan
--------------------------------------------------------------------------------------
| Id  | Operation                      |  Name               | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     |     1 |    48 |     4 |
|   1 |  NESTED LOOPS                  |                     |     1 |    48 |     4 |
|*  2 |   VIEW                         |                     |     1 |    22 |     3 |
|   3 |    WINDOW BUFFER               |                     |     1 |    25 |     3 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| UCBSVCO             |     1 |    25 |     1 |
|*  5 |      INDEX RANGE SCAN          | UCBSVCO_CUST_INDEX  |     8 |       |     1 |
|   6 |   TABLE ACCESS BY INDEX ROWID  | UTRSOTP             |     1 |    26 |     1 |
|*  7 |    INDEX UNIQUE SCAN           | UTRSOTP_KEY_INDEX   |     1 |       |       |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("SVCO"."DISPATCH_DT"="SVCO"."MAX_DISPATCH_DT")
   4 - filter("UCBSVCO"."UCBSVCO_PREM_CODE"='507601' AND
              "UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL)
   5 - access("UCBSVCO"."UCBSVCO_CUST_CODE"=1320908)
   7 - access("SVCO"."SOTP_CD"="UTRSOTP"."UTRSOTP_CODE")


Statistics Snapshot

Type  Statistic Name                                    Value
----- ---------------------------------------- --------------
Latch cache buffers chains                              9,098
      library cache                                     3,608
      row cache objects                                   370

Stats buffer is pinned count                                8
      consistent gets                                      15
      db block gets                                        53
      execute count                                        15
      parse count (hard)                                    1
      physical reads                                        0
      physical writes                                       0
      redo size                                         9,008
      session logical reads                                68
      session pga memory                                    0
      session pga memory max                                0
      sorts (disk)                                          0
      sorts (memory)                                        1
      sorts (rows)                                          2
      table fetch by rowid                                 10
      table scan blocks gotten                              0

Time  elapsed time (centiseconds)                          18

Difference Report


TYPE  NAME                                           original         rewrite      DIFFERENCE
----- --------------------------------------- --------------- --------------- ---------------
Latch cache buffers chains                            2142103            9098         2133005
      library cache                                    116725            3608          113117
      row cache objects                                 19592             370           19222

Stats buffer is pinned count                                8               8               0
      consistent gets                                  106693              15          106678
      db block gets                                        56              53               3
      execute count                                        16              15               1
      parse count (hard)                                    1               1               0
      physical reads                                    59924               0           59924
      physical writes                                       0               0               0
      redo size                                          9524            9008             516
      session logical reads                            106749              68          106681
      session pga memory                               594008               0          594008
      session pga memory max                                0               0               0
      sorts (disk)                                          0               0               0
      sorts (memory)                                        0               1              -1
      sorts (rows)                                          0               2              -2
      table fetch by rowid                                 10              10               0
      table scan blocks gotten                         106669               0          106669


Time  elapsed time (centiseconds)                        2016              18            1998

Original


STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=106693 r=59924 w=0 time=19518419 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=74130 op='TABLE ACCESS FULL UCBSVCO (cr=106691 r=59924 w=0 time=19518381 us)'
STAT #3 id=3 cnt=1 pid=2 pos=2 obj=0 op='SORT AGGREGATE (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=138 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=45 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=22 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=12 us)'

Rewrite


STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=15 r=0 w=0 time=249 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=0 op='VIEW  (cr=13 r=0 w=0 time=228 us)'
STAT #3 id=3 cnt=2 pid=2 pos=1 obj=0 op='WINDOW BUFFER (cr=13 r=0 w=0 time=218 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=37 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=16 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=9 us)'

In the original query, there was some question as to whether or not the SELECT MAX() was happening for each row. I had not included the predicate information section for the EXPLAIN PLAN which would have cleared this up. Note the filter on line 2 of the plan for the full table scan.


2 - filter("UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND 
           "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL AND
           "UCBSVCO"."UCBSVCO_DISPATCH_DATE"= 
           (SELECT /*+ */ MAX("UCBSVCO"."UCBSVCO_DISPATCH_DATE") 
              FROM "UIMSMGR"."UCBSVCO" "UCBSVCO" 
             WHERE "UCBSVCO"."UCBSVCO_CUST_CODE"=1320908 
               AND "UCBSVCO"."UCBSVCO_PREM_CODE"='507601' 
               AND "UCBSVCO"."UCBSVCO_STUS_CODE"='C' 
               AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL))

The SELECT MAX query is part of the filter on this step. That means that it will get executed once for every row that is examined. One thing to note is that it is most likely that there will be subquery caching that occurs since the MAX query remains the same. In other words, the MAX query will be executed once and the answer stored in a cached array so that it really won't get executed every time, only when the inputs are new. And, in this case, the inputs are always the same so the answer will be retrieved from cache on every check except for the first one. I've got a really good example of how subquery caching works that I'll post on another day. :)

So, the plan chosen does show that the SELECT MAX would occur as a filter for the full table scan and that means it could possibly execute once for each row. It was just "lucky" that the MAX query always used the same inputs and therefore didn't really require an actual execution of the MAX query for each row evaluated by the WHERE clause due to subquery caching.

But, the thing to notice about the original plan is that it does require 2 access operations against the UCBSVCO table where one should suffice. The new analytic syntax does only one access operation on that table. That's good! If you can do something once instead of twice to get the same end result, once is better. It's like needing to go to the store to get bread and milk and you make one trip and get the bread and bring it home, then you make a second trip to get the milk. You could've gotten both items in a single trip and saved yourself a lot of time and effort. It works the same way with Oracle. Don't send Oracle to the store twice if once will do.

There were a couple of other comments I wanted to discuss as well.
From Bundit: Is there any reason to avoid using analytic function due to performance issue?

I'd say that analytic functions aren't a fix for everything. They're just another tool in the toolbox to choose from. I like the saying "just because you have a hammer doesn't mean every problem is a nail". You can't use analytic functions to fix everything, but you can use them to help reduce the resource requirements for certain classes of problems like the one I've demonstrated. They won't always be the answer but they should be one of the syntax options to test when/where appropriate. Only through proper testing would you be able to prove which way of writing a particular query would be most efficient.

From Craig M: Are the columns in the partition by necessary? It seems like it isn't adding anything since you are filtering on those columns.

In this case, you're right. Since the WHERE clause is specifying only a single cust_code and prem_code, the partition by isn't really necessary. For me, it's just a habit I'm in of always clearly indicating on which columns I want grouping to occur.

Also from Craig M: Also, I use analytic functions frequently, however I am not sure I would have ever thought to this problem in this way. Would it provide an advantage over something like


with max_ucbsvco as
(select sotp_cd
   from
    (   select ucbsvco_sotp_code sotp_cd
          from ucbsvco
         where ucbsvco_cust_code = :b1
           and ucbsvco_prem_code = :b2
           and ucbsvco_stus_code = 'C'
           and ucbsvco_dispatch_ind is not null
           and ucbsvco_dispatch_date is not null
         order by ucbsvco_dispatch_date desc
    )
  where rownum = 1
)
select utrsotp_desc
  from utrsotp
 where utrsotp_code = 
       ( select sotp_cd
           from max_ucbsvco);

The only way I can answer this now is to say we'd have to test it and see. But, I think the analytic version would still be better if this version did what I think you're expecting it to do and materialize a temp table with the result of the query against UCBSVCO. First, the plan for this version would require an "almost" sort. The use of ORDER BY to put the rows in descending order would mean the rows output from that step would need to come out in sorted order and Oracle would have to do that operation. Now, because of ROWNUM, it should do an in-line sort vs. a normal sort operation. In other words, as the rows are read, Oracle will check to see if the ucbsvco_dispatch_date column value is greater than the previous row's value. After all the rows are read, then it will have the one row which has the most current (MAX) value. So, I'd want to see how the bit of extra ordering work would be handled as compared to the analytic version.

The other thing is that if the result set is materialized for that query, there will be some small overhead to create that temp table and then read from it. That overhead doesn't exist in the analytic version.

So, my instinct on this one is that the analytic version would still win. I mean 15 LIOs and .18 seconds is likely going to be pretty hard to beat. But, it's just speculation unless I get a chance to visit that client again and test it out. :)

Thanks to everyone who commented on the original post and I hope this follow-up provides more info to clarify issues raised.