Pages

Monday, July 28, 2008

Wordle

I was directed to a cool little site called Wordle that takes text, the URL of any page that has an Atom or RSS feed, or a del.icio.us user name and turns it into a cool little graphic.

I tried it twice: once with my blog and once with the text of my Method R bio page. The one from my blog needs to have it's "nerdiness factor" toned down (this feedback according to my colleague Ron...and I agree by the way). I think the one from my bio page is pretty cool. Here they both are so you can see what you think.

From my blog:


From my bio page text:

Wednesday, July 23, 2008

Just for fun

I occasionally see something called a meme on other blogs I read and recently saw one that I thought looked like fun so I thought I'd try it.

Here's how it works:

Using BigHugeLabs Mosaic maker to make a mosaic and Flickr to do your searches, type your answer to each of the questions below into Flickr Search. Then, using only the FIRST page, pick an image. Copy and paste each of the URLs for the images into the mosaic maker.

The Questions:
1. What is your first name?
2. What is your favorite food?
3. What high school did you go to?
4. What is your favorite color?
5. Who is your celebrity crush?
6. Favorite drink?
7. Dream vacation?
8. Favorite dessert?
9. What you want to be when you grow up?
10. What do you love most in life?
11. One word to describe you.
12. Your flickr name.

You never know what may turn up on that first Flickr page. Can you guess what some of my answers were? They may not be what you think. :)

Tuesday, July 22, 2008

Optimizer won't choose Index Fast Full Scan?

A while back, I was asked to take a look at a situation where a COUNT(*) query was using full table scan instead of an Index Fast Full Scan. The answer turned out to be fairly simple. Can you determine the problem?


Table with about 480K rows and about 800 Mb in size.

It has the following indexes:

SQL> @show_table_indexes
Enter value for owner: aradmin70
Enter value for table: t1478

Column Column
INDEX_NAME Name Position
------------------------------ ------------------------------ --------
I1478_3_1 C3 1
I1478_600000214_1 C600000214 1
I1478_600000215_1 C600000215 1
I1478_740002011_1 C740002011 1
I1478_740002012_1 C740002012 1
I1478_770000000_1 C770000000 1
I1478_840000044_1 C840000044 1
IT1478 C1 1

When executing:

select C600000215, count(*)
from aradmin70.T1478
group by C600000215;

It uses the following plan:

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=27954 Card=563 Bytes=11823)
1 0 HASH (GROUP BY) (Cost=27954 Card=563 Bytes=11823)
2 1 TABLE ACCESS (FULL) OF 'T1478' (TABLE) (Cost=27922 Card=481061 Bytes=10102281)

Even using a hint, the optimizer still goes for a full scan.

select /*+ INDEX_FFS(tab1) */ C600000215, count(*)
from aradmin70.T1478 tab1
group by C600000215;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=27954 Card=563 Bytes=11823)
1 0 HASH (GROUP BY) (Cost=27954 Card=563 Bytes=11823)
2 1 TABLE ACCESS (FULL) OF 'T1478' (TABLE) (Cost=27922 Card=481061 Bytes=10102281)

The index should obviously be a better choice:

select count(*) "Extents",
round(sum(bytes)/(1024*1024)) "MBytes"
from dba_extents
where owner='ARADMIN70'
and segment_name='T1478';

Extents MBytes
---------- ----------
172 807

For the index:

Extents MBytes
---------- ----------
32 17

What should be checked?




...and the answer is...

You should check to see if the indexed column allows nulls. If the column allows nulls, then the index can't be used to count the nulls since rows with null values are not stored for single column indexes.

A simple describe showed that the column C600000215 which was used as the index column was a VARCHAR2(100) and it did not have a NOT NULL constraint. So, the optimizer couldn't choose the IFFS.

Here's the fix that was implemented to allow the IFFS to be chosen:

alter table ARADMIN70.T1478
modify (C600000215 NOT NULL);

This added the NOT NULL constraint to the column. Now, when the query is executed, the optimizer chooses the IFFS.

select C600000215, count(*)
from aradmin70.T1478
group by C600000215
order by C600000215;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1781 Card=5340 Bytes=112140)
1 0 SORT (GROUP BY) (Cost=1781 Card=5340 Bytes=112140)
2 1 INDEX (FAST FULL SCAN) OF 'I1478_600000215_1' (INDEX) (Cost=605 Card=494913 Bytes=10393173)


Of course, if the column had to be allowed to have nulls then the solution would've required some more thought. But fortunately, this one worked fine.

Monday, July 21, 2008

Code formatting/Syntax Highlighting

I've been trying for a while to find a way to get a code formatting utility to work on my blog. I'd originally seen it in WordPress where there's a built in tag to handle code formatting. But, no such thing exists natively for Blogger.

I found a Google Code utility that I had tried to get added without much success but today, my colleague Ron got it working for us to use in our new forum (coming soon!). Now, with a couple of additions to my Blogger template, I'm able to use a slightly modified pre tag to get my code blocks formatted quite nicely.

All I have to do is enclose my code like this:

<pre name="code" class="sql">
</pre>
and I'm all set. To get this example to show up, I had to use "& lt" instead of the actual pre start and end tags (using <). That's good to know as it allows you to also display html as formatted code text and not be interpretted.

Cool stuff. The piece that I needed to get all this working was related to javascript, so when Ron got that part handled, I was in business!

Check it out if you're looking for a way to format your code in a nice, clean way.

Friday, July 18, 2008

Anonymous

I've been watching the continuing comments related to Tom Kyte's recent post which referenced Cary's recent post on Christian Antognini's new book. There are a couple of things that fascinate me about the whole exchange: 1) all the comments about Cary are happening over at Tom's blog and 2) the person who is firing at Cary is "Anonymous".

To the first point, it really just cracks me up that Tom's very short and simple post about liking something Cary wrote for his foreward to Chris's book has received all the comments. Why, I wonder, didn't "Anonymous" make comments on Cary's blog since it was Cary's words that he took exception to? Odd...

To the second point, I have to rant a bit on the whole idea of anonymity. First let me say that I don't like it. In the context of posting to public forums or blogs or where ever a person chooses to share their opinions and comments, it seems unnecessarily covert to not identify who you are. I'm interested to know what the fear is that makes someone not want to identify themselves. Is it fear of reprisal? Is it fear of looking "dumb"? What?

It reminds me of a couple of quotes:
"I disapprove of what you say, but I will defend to the death your right to say it."
S. G. Tallentyre


"If you aren't going to say something directly to someone's face, then don't use online as an opportunity to say it. It is this sense of bravery that people get when they are anonymous that gives the blogosphere a bad reputation."
Mena Trott

I think there is a place for anonymous commentary. Many times people won't tell you the full truth because they actually like you and think it may hurt your feelings or something like that. But they feel OK to share when they know they won't be identified.

One of the best personal review techniques I've ever heard of is the 360 degree feedback process. It is an review process that involves having co-workers, managers, and as many people as possible involved in providing feedback to a single individual. All their feedback is done anonymously. Each person that participates fills out detailed questionnaires about the person and the results are compiled to provide a very specific report back to the employee about their performance as other people see it. Think about it. If you were asked to talk about your manager (remember, this is the person who does your review and grants you raises and approves your time off requests) many people are less than totally forthcoming when answering questions if they think their feedback will be known to the boss. But, when they are assured their anonymity, they feel free to truly speak their minds.

OK. I get that. And, I'm sure I could find other instances when anonymity can serve a very good purpose. But, I don't see how remaining anonymous when you make a blog post or comment is valid. There's another quote that I heard recently but don't remember it exactly that goes something like, if you wouldn't sign your name to something you say, then don't say it. I think that is right on.

We live in a country where we are free to express our opinions. Only truly inflammatory speech is prohibited. But, even then, you've got to really be over the top for anything you say to be legally actionable.

So, why not own what you have to say? If you're not willing to own it, then just plain don't say it. Lurk on the blogs and keep your opinions to yourself. I may not like what you say. If I don't like it, I can add my voice to the conversation in opposition. But in order to engage in true, meaningful, open debate and discussion, I truly don't see how remaining anonymous serves that purpose.

From the start, an anonymous poster is (at least by me) viewed with suspicion. For the most part, I either ignore or gloss over their comments. But, as I believe so often happens, many people who post anonymously tend to have a recognizable pattern in how they write. So, after a while, the anonymous person is actually recognized...even if it's only a recognition that it's the same person's "voice". Sometimes the anonymous voice can be cross-matched to an actual person from places where they have chosen to identify themselves. Writing style is similar to a fingerprint and can often be used to identify the writer even when they post without identifying themselves.

But it is hard to completely ignore the anonymous commenter. When they have things to say that you don't understand or that you disagree with, you want to respond. But, it just feels "off" to respond to a "nobody".

I do support anyone's right to say things they think/believe even if it is in opposition to my own thoughts/beliefs. I just wish people would own their comments with their real identities. I think it speaks to strong character and intent when you are willing to identify yourself.

I recently read something about how your name is your brand. If you hide your real name behind an anonymous label or even some made up screen name (for example, JoeCoolDBA or SmartGuy), you are not presenting your brand and that creates confusion and waters down your name.

So don’t be tempted to bury your own beautiful name. Love it. Own it. Flaunt it.

With that said, this is Karen Morton, signing off for today.

Thursday, July 17, 2008

Analytics to the rescue - followup

I thought I'd do a follow-up post to yesterday's "Analytics to the rescue" post as the discussion in the comments brought up a couple of things that needed more info than could be easily included in the comment section. So...

What you'll see below is a more complete text of the tests I executed that were discussed in yesterday's post. First, the original query, then for the rewritten query. What I show for each is the actual STAT lines from the 10046 trace data, the EXPLAIN PLAN and some additional stats (collected with a SQL test harness set of scripts I use).

Test for original query

select utrsotp_desc
from utrsotp, ucbsvco
where ucbsvco_sotp_code = utrsotp_code
and ucbsvco_dispatch_date =
(select max(ucbsvco_dispatch_date)
from ucbsvco
where ucbsvco_cust_code = 1320908
and ucbsvco_prem_code = '507601'
and ucbsvco_stus_code = 'C'
and ucbsvco_dispatch_ind is not null)
and ucbsvco_stus_code = 'C'
and ucbsvco_dispatch_ind is not null


STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS (cr=106693 r=59924 w=0 time=19518419 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=74130 op='TABLE ACCESS FULL UCBSVCO (cr=106691 r=59924 w=0 time=19518381 us)'
STAT #3 id=3 cnt=1 pid=2 pos=2 obj=0 op='SORT AGGREGATE (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=138 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=45 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=22 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=12 us)'


Explain Plan
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 38 | 1768 |
| 1 | NESTED LOOPS | | 1 | 38 | 1768 |
|* 2 | TABLE ACCESS FULL | UCBSVCO | 1 | 12 | 1767 |
| 3 | SORT AGGREGATE | | 1 | 20 | |
|* 4 | TABLE ACCESS BY INDEX ROWID| UCBSVCO | 1 | 20 | 1 |
|* 5 | INDEX RANGE SCAN | UCBSVCO_CUST_INDEX | 8 | | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID | UTRSOTP | 1 | 26 | 1 |
|* 7 | INDEX UNIQUE SCAN | UTRSOTP_KEY_INDEX | 1 | | |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND"
IS NOT NULL AND "UCBSVCO"."UCBSVCO_DISPATCH_DATE"= (SELECT /*+ */
MAX("UCBSVCO"."UCBSVCO_DISPATCH_DATE") FROM "UIMSMGR"."UCBSVCO" "UCBSVCO" WHERE
"UCBSVCO"."UCBSVCO_CUST_CODE"=1320908 AND "UCBSVCO"."UCBSVCO_PREM_CODE"='507601' AND
"UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL))
4 - filter("UCBSVCO"."UCBSVCO_PREM_CODE"='507601' AND
"UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL)
5 - access("UCBSVCO"."UCBSVCO_CUST_CODE"=1320908)
7 - access("UCBSVCO"."UCBSVCO_SOTP_CODE"="UTRSOTP"."UTRSOTP_CODE")


Statistics Snapshot

Type Statistic Name Value
----- ---------------------------------------- --------------
Latch cache buffers chains 2,142,103
library cache 116,725
row cache objects 19,592

Stats buffer is pinned count 8
consistent gets 106,693
db block gets 56
execute count 16
parse count (hard) 1
physical reads 59,924
physical writes 0
redo size 9,524
session logical reads 106,749
session pga memory 594,008
session pga memory max 0
sorts (disk) 0
sorts (memory) 0
sorts (rows) 0
table fetch by rowid 10
table scan blocks gotten 106,669

Time elapsed time (centiseconds) 2,016



Test for rewritten query

select utrsotp_desc
from utrsotp,
(
select ucbsvco_sotp_code sotp_cd, ucbsvco_dispatch_date dispatch_dt,
max(ucbsvco_dispatch_date) over (partition by ucbsvco_cust_code, ucbsvco_prem_code) max_dispatch_dt
from ucbsvco
where ucbsvco_cust_code = 1320908
and ucbsvco_prem_code = '507601'
and ucbsvco_stus_code = 'C'
and ucbsvco_dispatch_ind is not null
) svco
where svco.dispatch_dt = svco.max_dispatch_dt
and svco.sotp_cd = utrsotp_code


STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS (cr=15 r=0 w=0 time=249 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=0 op='VIEW (cr=13 r=0 w=0 time=228 us)'
STAT #3 id=3 cnt=2 pid=2 pos=1 obj=0 op='WINDOW BUFFER (cr=13 r=0 w=0 time=218 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=37 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=16 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=9 us)'


Explain Plan
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 48 | 4 |
| 1 | NESTED LOOPS | | 1 | 48 | 4 |
|* 2 | VIEW | | 1 | 22 | 3 |
| 3 | WINDOW BUFFER | | 1 | 25 | 3 |
|* 4 | TABLE ACCESS BY INDEX ROWID| UCBSVCO | 1 | 25 | 1 |
|* 5 | INDEX RANGE SCAN | UCBSVCO_CUST_INDEX | 8 | | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID | UTRSOTP | 1 | 26 | 1 |
|* 7 | INDEX UNIQUE SCAN | UTRSOTP_KEY_INDEX | 1 | | |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("SVCO"."DISPATCH_DT"="SVCO"."MAX_DISPATCH_DT")
4 - filter("UCBSVCO"."UCBSVCO_PREM_CODE"='507601' AND
"UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL)
5 - access("UCBSVCO"."UCBSVCO_CUST_CODE"=1320908)
7 - access("SVCO"."SOTP_CD"="UTRSOTP"."UTRSOTP_CODE")


Statistics Snapshot

Type Statistic Name Value
----- ---------------------------------------- --------------
Latch cache buffers chains 9,098
library cache 3,608
row cache objects 370

Stats buffer is pinned count 8
consistent gets 15
db block gets 53
execute count 15
parse count (hard) 1
physical reads 0
physical writes 0
redo size 9,008
session logical reads 68
session pga memory 0
session pga memory max 0
sorts (disk) 0
sorts (memory) 1
sorts (rows) 2
table fetch by rowid 10
table scan blocks gotten 0

Time elapsed time (centiseconds) 18


Difference Report

TYPE NAME original rewrite DIFFERENCE
----- --------------------------------------- --------------- --------------- ---------------
Latch cache buffers chains 2142103 9098 2133005
library cache 116725 3608 113117
row cache objects 19592 370 19222

Stats buffer is pinned count 8 8 0
consistent gets 106693 15 106678
db block gets 56 53 3
execute count 16 15 1
parse count (hard) 1 1 0
physical reads 59924 0 59924
physical writes 0 0 0
redo size 9524 9008 516
session logical reads 106749 68 106681
session pga memory 594008 0 594008
session pga memory max 0 0 0
sorts (disk) 0 0 0
sorts (memory) 0 1 -1
sorts (rows) 0 2 -2
table fetch by rowid 10 10 0
table scan blocks gotten 106669 0 106669


Time elapsed time (centiseconds) 2016 18 1998


Original

STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS (cr=106693 r=59924 w=0 time=19518419 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=74130 op='TABLE ACCESS FULL UCBSVCO (cr=106691 r=59924 w=0 time=19518381 us)'
STAT #3 id=3 cnt=1 pid=2 pos=2 obj=0 op='SORT AGGREGATE (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=138 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=45 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=22 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=12 us)'


Rewrite

STAT #3 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS (cr=15 r=0 w=0 time=249 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=0 op='VIEW (cr=13 r=0 w=0 time=228 us)'
STAT #3 id=3 cnt=2 pid=2 pos=1 obj=0 op='WINDOW BUFFER (cr=13 r=0 w=0 time=218 us)'
STAT #3 id=4 cnt=2 pid=3 pos=1 obj=74130 op='TABLE ACCESS BY INDEX ROWID UCBSVCO (cr=13 r=0 w=0 time=145 us)'
STAT #3 id=5 cnt=9 pid=4 pos=1 obj=82187 op='INDEX RANGE SCAN UCBSVCO_CUST_INDEX (cr=3 r=0 w=0 time=37 us)'
STAT #3 id=6 cnt=1 pid=1 pos=2 obj=74368 op='TABLE ACCESS BY INDEX ROWID UTRSOTP (cr=2 r=0 w=0 time=16 us)'
STAT #3 id=7 cnt=1 pid=6 pos=1 obj=81903 op='INDEX UNIQUE SCAN UTRSOTP_KEY_INDEX (cr=1 r=0 w=0 time=9 us)'


In the original query, there was some question as to whether or not the SELECT MAX() was happening for each row. I had not included the predicate information section for the EXPLAIN PLAN which would have cleared this up. Note the filter on line 2 of the plan for the full table scan.

2 - filter("UCBSVCO"."UCBSVCO_STUS_CODE"='C' AND
"UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL AND
"UCBSVCO"."UCBSVCO_DISPATCH_DATE"=
(SELECT /*+ */ MAX("UCBSVCO"."UCBSVCO_DISPATCH_DATE")
FROM "UIMSMGR"."UCBSVCO" "UCBSVCO"
WHERE "UCBSVCO"."UCBSVCO_CUST_CODE"=1320908
AND "UCBSVCO"."UCBSVCO_PREM_CODE"='507601'
AND "UCBSVCO"."UCBSVCO_STUS_CODE"='C'
AND "UCBSVCO"."UCBSVCO_DISPATCH_IND" IS NOT NULL))


The SELECT MAX query is part of the filter on this step. That means that it will get executed once for every row that is examined. One thing to note is that it is most likely that there will be subquery caching that occurs since the MAX query remains the same. In other words, the MAX query will be executed once and the answer stored in a cached array so that it really won't get executed every time, only when the inputs are new. And, in this case, the inputs are always the same so the answer will be retrieved from cache on every check except for the first one. I've got a really good example of how subquery caching works that I'll post on another day. :)

So, the plan chosen does show that the SELECT MAX would occur as a filter for the full table scan and that means it could possibly execute once for each row. It was just "lucky" that the MAX query always used the same inputs and therefore didn't really require an actual execution of the MAX query for each row evaluated by the WHERE clause due to subquery caching.

But, the thing to notice about the original plan is that it does require 2 access operations against the UCBSVCO table where one should suffice. The new analytic syntax does only one access operation on that table. That's good! If you can do something once instead of twice to get the same end result, once is better. It's like needing to go to the store to get bread and milk and you make one trip and get the bread and bring it home, then you make a second trip to get the milk. You could've gotten both items in a single trip and saved yourself a lot of time and effort. It works the same way with Oracle. Don't send Oracle to the store twice if once will do.

There were a couple of other comments I wanted to discuss as well.
From Bundit: Is there any reason to avoid using analytic function due to performance issue?

I'd say that analytic functions aren't a fix for everything. They're just another tool in the toolbox to choose from. I like the saying "just because you have a hammer doesn't mean every problem is a nail". You can't use analytic functions to fix everything, but you can use them to help reduce the resource requirements for certain classes of problems like the one I've demonstrated. They won't always be the answer but they should be one of the syntax options to test when/where appropriate. Only through proper testing would you be able to prove which way of writing a particular query would be most efficient.

From Craig M: Are the columns in the partition by necessary? It seems like it isn't adding anything since you are filtering on those columns.

In this case, you're right. Since the WHERE clause is specifying only a single cust_code and prem_code, the partition by isn't really necessary. For me, it's just a habit I'm in of always clearly indicating on which columns I want grouping to occur.

Also from Craig M: Also, I use analytic functions frequently, however I am not sure I would have ever thought to this problem in this way. Would it provide an advantage over something like

with max_ucbsvco as
(select sotp_cd
from
( select ucbsvco_sotp_code sotp_cd
from ucbsvco
where ucbsvco_cust_code = :b1
and ucbsvco_prem_code = :b2
and ucbsvco_stus_code = 'C'
and ucbsvco_dispatch_ind is not null
and ucbsvco_dispatch_date is not null
order by ucbsvco_dispatch_date desc
)
where rownum = 1
)
select utrsotp_desc
from utrsotp
where utrsotp_code =
( select sotp_cd
from max_ucbsvco);


The only way I can answer this now is to say we'd have to test it and see. But, I think the analytic version would still be better if this version did what I think you're expecting it to do and materialize a temp table with the result of the query against UCBSVCO. First, the plan for this version would require an "almost" sort. The use of ORDER BY to put the rows in descending order would mean the rows output from that step would need to come out in sorted order and Oracle would have to do that operation. Now, because of ROWNUM, it should do an in-line sort vs. a normal sort operation. In other words, as the rows are read, Oracle will check to see if the ucbsvco_dispatch_date column value is greater than the previous row's value. After all the rows are read, then it will have the one row which has the most current (MAX) value. So, I'd want to see how the bit of extra ordering work would be handled as compared to the analytic version.

The other thing is that if the result set is materialized for that query, there will be some small overhead to create that temp table and then read from it. That overhead doesn't exist in the analytic version.

So, my instinct on this one is that the analytic version would still win. I mean 15 LIOs and .18 seconds is likely going to be pretty hard to beat. But, it's just speculation unless I get a chance to visit that client again and test it out. :)


Thanks to everyone who commented on the original post and I hope this follow-up provides more info to clarify issues raised.

Wednesday, July 16, 2008

Analytics to the rescue

I see a lot of SQL. Actually, I see a lot of bad SQL. I guess that's because my job is mainly to help people with performance problems and the majority of the time, the problems are rooted in one or more inefficient SQL statements. Yesterday I was doing a demo of our Profiler software and showed one case that I see over and over, so I thought I'd share it here.

I have a profile for a 4 hour and 44 minute trace of an application process that was taking too long. The top 3 events in the profile (accounting for 82% of the total overall response time) were:
CPU service, FETCH calls  8,357.40s (48.9%)
db file scattered read 3,075.95s (18.0%)
db file sequential read 2,577.67s (15.1%)
The SQL statement that topped the list taking 12,695.90s (74.3% of the total response time) was:
select utrsotp_desc
from utrsotp, ucbsvco
where ucbsvco_sotp_code = utrsotp_code
and ucbsvco_dispatch_date =
(select max(ucbsvco_dispatch_date)
from ucbsvco
where ucbsvco_cust_code = :b1
and ucbsvco_prem_code = :b2
and ucbsvco_stus_code = 'C'
and ucbsvco_dispatch_ind is not null)
and ucbsvco_stus_code = 'C'
and ucbsvco_dispatch_ind is not null;
Given the fact that this one statement accounted for 74% of the overall response time, with a quick look at the rest of the profile, you could see that this one query was responsible for almost all the response time consumed by the top 3 events in the profile. So, the cool thing about this profile is that it was easy to see that by fixing one SQL statement, we'd basically get rid of our dominant problem.

Now, a bit more info. This statement was executed 1,332 times during the almost 5 hour period captured by the trace. The query returned only 587 rows and did a total of 141,084,499 block accesses (LIOs). That means that on average it took almost 160,000 LIOs for each execution. Ummm...that's not good. It should be using something like 20 LIOs or less per row (based on a maximum tolerance of no more than 10 LIOs per joined table per row).

It kinda seems obvious that there's a big problem with this query doesn't it? So, why had this process been running for so long without someone catching it? I mean, if we could fix this one statement, we could cause as much as a 74% reduction in response time. That means the 4 hr and 44 minute process could drop down to about 1 hour and 15 minutes.

But the problem was in the way the code was being reviewed on the test instance. First, (mistake 1) only EXPLAIN PLAN was used to review the anticipated plan for the SQL and (mistake 2) the SQL was executed to make sure the response time (for a single execution), seemed reasonable. Another thing worth noting is (mistake 3) that the test instance had only about 10% of the volume of data as production, so virtually anything that was executed took only a second or so due to the lack of volume. So, the response time never was really noted as an issue because it always ran fast. The EXPLAIN PLAN didn't really rasie any eyebrows either because it looked like this:

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 38 | 1768 |
| 1 | NESTED LOOPS | | 1 | 38 | 1768 |
| 2 | TABLE ACCESS FULL | UCBSVCO | 1 | 12 | 1767 |
| 3 | SORT AGGREGATE | | 1 | 20 | |
| 4 | TABLE ACCESS BY INDEX ROWID| UCBSVCO | 1 | 20 | 1 |
| 5 | INDEX RANGE SCAN | UCBSVCO_CUST_INDEX | 8 | | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID | UTRSOTP | 1 | 26 | 1 |
| 7 | INDEX UNIQUE SCAN | UTRSOTP_KEY_INDEX | 1 | | |
--------------------------------------------------------------------------------------
Now, the TABLE ACCESS FULL step might have raised a few eyebrows, but since they saw Rows = 1 at that step, they figured it shouldn't be a problem. I mean, hey, if there's only going to be one row pulled from the table, they thought, how bad could it be? (Remember that they were seeing sub-second response times in test.)

But, once I was able to point out the combined effect of executing that SQL over 1300 times in their production, full volume environment, the problem was (all of a sudden) easy to see. Something needed to be done to make this SQL statement more efficient.

As I mentioned when I first started this post, I see SQL statements similar to this one over and over again. They are very typical in that lots of apps use tables that store both historical and current info in them. For instance, imagine that if you wanted to get the latest price from a product table that stored each price in a separate product row with an effective date column value used to indicate the date the price was put into effect. In order to do that, you'd have to make sure you retrieved the row which had the MAX effective date for that product. The way the query is most commonly formulated to do this is with this WHERE clause:
WHERE price_eff_dt = (SELECT MAX(price_eff_dt) FROM product_tab WHERE product_id = outer.product_id)
This is exactly what was going on in the problem query in this process.

Analytics to the rescue! By using an analytic MAX function instead, Oracle is able to make a single pass over the data to get the answer it needs. Here's the query rewritten to use an analytic MAX function instead:
select utrsotp_desc
from utrsotp,
(select ucbsvco_sotp_code sotp_cd, ucbsvco_dispatch_date dispatch_dt,
max(ucbsvco_dispatch_date) over (partition by ucbsvco_cust_code,
ucbsvco_prem_code) max_dispatch_dt
from ucbsvco
where ucbsvco_cust_code = :b1
and ucbsvco_prem_code = :b2
and ucbsvco_stus_code = 'C'
and ucbsvco_dispatch_ind is not null) svco
where svco.dispatch_dt = svco.max_dispatch_dt
and svco.sotp_cd = utrsotp_code;
The plan for this query was:

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 48 | 4 |
| 1 | NESTED LOOPS | | 1 | 48 | 4 |
|* 2 | VIEW | | 1 | 22 | 3 |
| 3 | WINDOW BUFFER | | 1 | 25 | 3 |
|* 4 | TABLE ACCESS BY INDEX ROWID| UCBSVCO | 1 | 25 | 1 |
|* 5 | INDEX RANGE SCAN | UCBSVCO_CUST_INDEX | 8 | | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID | UTRSOTP | 1 | 26 | 1 |
|* 7 | INDEX UNIQUE SCAN | UTRSOTP_KEY_INDEX | 1 | | |
--------------------------------------------------------------------------------------

Tests of the two queries on a full volume data source (for a single set of bind values expected to return 1 row), had the following results:
Original query
160,693 LIOs
20.16s

New query
15 LIOs
.18s

That's a 99%+ reduction in LIOs and response time. Not bad, huh? :)

Simple change, massive difference. I didn't get to stay around long enough on the client site to see the fix implemented in production, but the expected benefit would mean, as mentioned above, that the response time should drop down from nearly 5 hours to just over 1 hour. Nice.

So, I'm always glad to see examples like this one pop up when I'm helping a customer with performance issues. The analytic function is a nice alternative to the traditional way of writing these types of queries that works extremely well and provides the performance needed for high numbers of executions in a high data volume environment.

Thursday, July 10, 2008

MIN/MAX and Index Full Scans

We got a question at work today from a blogger in regard to his blog post entitled MIN and MAX Functions in a Single Query are Disastrous. If you check out his post, you'll see the details of his tests. Basically, he was wondering why the optimizer would choose to do a full table scan instead of using an INDEX FULL SCAN (MIN/MAX) operation for a query that contained both a MIN and a MAX function call on the same indexed column. I thought my answer to his question to us, about why the optimizer makes such a "bad" decision about the execution plan, would be a good topic for my post today.

Normally, if you write a query that uses only one call, either MIN or MAX, the optimizer will use the INDEX FULL SCAN (MIN/MAX) optimization to read only through to the first leaf block (for MIN) and to the last leaf block (for MAX). This is a great optimization that can save hundreds/thousands of block accesses over either a full table scan or even an INDEX FAST FULL SCAN. But, as Momen's tests show, if you execute a query that has both MIN and MAX in it, the optimizer refuses to use this optimization although it would seem to us (the feeble human brain) that the optimizer should know that it could do the operation twice, once to get the first block and once to get the last block, right?

But, the optimizer can't do this. The INDEX FULL SCAN (MIN/MAX) operation will do either an index scan for the first leaf block or an index scan for the last leaf block. It can not do both at the same time. Think about it... The code to handle the operation would have to have some fairly specific logic in it to handle the dual calls.

The question to then ask is why wouldn't Oracle choose to add that logic? Well, since there's actually an easy way to get the result you want using the current operation's code path, I'd say Oracle didn't want to invest development time into writing specific case code when the simple branch of MIN or MAX can work properly as long as the developer knows how to write their SQL to invoke it.

Here's what has to happen.
This query:
select min(empno) min_empno, max(empno) max_empno
from emp;

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 442 (5)| 00:00:06 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | INDEX FAST FULL SCAN| PK_EMP | 917K| 4480K| 442 (5)| 00:00:06 |
---------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
196 recursive calls
0 db block gets
1959 consistent gets
1 physical reads
0 redo size
476 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed

becomes this:
select a.min_empno, b.max_empno
from
(select min(empno) min_empno from emp) a,
(select max(empno) max_empno from emp) b;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 26 | 6 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 3 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 5 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| PK_EMP | 917K| 4480K| 3 (0)| 00:00:01 |
| 5 | VIEW | | 1 | 13 | 3 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 5 | | |
| 7 | INDEX FULL SCAN (MIN/MAX)| PK_EMP | 917K| 4480K| 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
476 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


The same results could also be achieved by doing a UNION of the two separate statements. Either way, what is required is that the developer must understand how the optimizer utilizes the INDEX FULL SCAN (MIN/MAX) optimization and write their SQL to take full advantage of it.

Perhaps at some point (I haven't tested this in 11g yet to see if the optimizer behaves any differently) the optimizer will have code path to handle both calls simultaneously, but for now, the developer must provide the optimizer with syntax to allow the optimizer to formulate the optimal overall execution plan.

To me, this is just one more really good example why the database should not be "just a black box" to developers. Writing high performance SQL requires understanding how the Oracle optimizer works, not just an ability to put together a statement that gets the correct answer.

Thanks to Momen for his post and his question to Method R on this. It was good food for thought and nice blog fodder! :)

Addendum - July 11, 2008, 8:57am
Thanks to my colleague Ron who did the 11g test on this and he confirmed that the optimizer behavior has not changed. So, even in 11g, the SQL would need to be written to submit the MIN and MAX calls separately in order to get the optimal execution plan.

Tuesday, July 8, 2008

Interesting survey results

I read recently about a study conducted by product review online TV site, ExpoTV, where it was found that many people do actually want to "have a conversation" with a brand. Brand can mean many different things. For example, Mr. Whipple (Charmin) and Tony the Tiger (Frosted Flakes) are examples of spokespersons who become part of the brand. Logos or fonts can also represent brands. Think of the Disney font or the font used for the Harry Potter movies. Branding is also slogans, like Nike's "Just do it". And sometimes the brand is just the company name like Apple or Starbucks.

The study found:
- Consumers not only want to talk to brands, they want to establish a conversation. 55% of consumers want an ongoing dialogue with brands.

- Learning about new products in the pipeline is a top priority. Respondents were most anxious to talk to the product design (49%) department, followed by customer support (14%), marketing (14%) and pricing (13%).

- Positive brand experiences can generate word-of-mouth buzz. More than 60% of those polled said they tell 10 or more people about the products they like while a third tell 20 or more people.

- Listening leads to loyalty. 89% of respondents would feel more loyal to brands which invited them to participate in a feedback group, and 92 percent of those who have a positive experience communicating with a brand will recommend purchasing a product from that brand to someone they know.

- Consumers are open to engaging with the competition. 93 percent of consumers surveyed would be interested or very interested in communicating with competitive brands that expressed interest in their feedback if their first choice is not interested in hearing what they have to say.


I know from my own experience with my iPhone and Kindle for example, that I'm more than happy to share my experiences with others and to recommend the products based on my good experiences. I really like being able to check forums for questions and reviews and I think more highly of those companies whose products I buy that provide them.

I think this is important for any business, or even people in the public eye, to know. People want the opportunity to interact and to share. I think that's one of the reasons why blogging is so popular. Making your brand hard to interact with makes people think you've got something to hide and makes them wary.

Opening up a discussion that allows people to share anything/everything about your brand can be a bit scary. I mean, what if someone says something "bad"? In the end, I think that an open dialogue is most important and if there is something "bad", it gives the brand owner a chance to respond and make corrections and have that be seen as well. I like hearing stories where a company "made it right" with a customer. It makes you believe there is a real concern about the consumer involved and it's not just a big machine cranking out product with the only goal being revenue generation. Revenue is certainly important, but if you treat your customers in a top-notch way, the revenue will come.

Monday, July 7, 2008

Unbreakable?

Oracle has had the "unbreakable" marketing spiel going for some time now. I don't know that I buy their bit, but today, I found the ultimate in unbreakable and it has nothing to do with Oracle. Move over Oracle. Meet Fisher Price.



Today I bought my daughter a Fisher Price Little People: Noah's Ark with Ark Animals set. She has the farm and loves it so I thought she'd also really like this new one since it has so many cool animals. But when I got the thing home and started to try to get it out of the box (note that the photo above shows the item completely unboxed and there is no warning on the packaging of the arduous task you face to get it that way!), it was like trying to break out of a maximum security prison. There were more twisty ties wrapped around plastic anchors bound to cardboard fortified with packing tape than you can imagine. The thing took me the better part of an hour just to get everything detached! I'm talking secure!

Oracle could take a lesson from Fisher Price I think.... If Oracle had half the "unbreakable-ness" that Fisher Price employs in their packaging, Oracle would be unbreakable indeed!