Monday, January 19, 2009

"Dynamic" WHERE clauses that aren't

What does the optimizer do with "dynamic" WHERE clauses? What I'm talking about is when you have a SELECT statement that has a "simple" WHERE clause like this:

WHERE column1 = NVL(:B1, column1)
AND column2 = NVL(:B2, column2)


The intention of this WHERE clause is to allow 4 different scenarios to be covered.
1) If :B1 is null and :B2 is null, then retrieve all rows.
2) If :B1 is null and :B2 is not null, then retrieve only rows where column2 = :B2.
3) If :B2 is null and :B1 is not null, then retrieve only rows where column1 = :B1.
4) If :B1 is not null and :B2 is not null, then retrieve only rows where column1 = :B1 and column2 = :B2.

The logic is that since the two conditions are AND'ed, that both have to evaluate to true in order to produce any rows. If, however, you wish to be flexible and allow for all 4 cases to be possible with a single query, then you have to find a way to ensure that both sides of the AND evaluate TRUE regardless of what inputs you receive. By using the NVL, you can accomplish that. You either pass in a null value or a not null value and the NVL check will make sure that the condition always evaluates to true. You get to either qualify which rows are returned by passing in a not null bind variable or you can leave the selection of rows unqualified by passing in a null value. Simple, huh?

Well, it may be simple in terms of the way the code looks, but what it does to the optimizer isn't that simple. The optimizer has to come up with a plan that can best serve all 4 possibilities equally. It can't/won't come up with 4 different plans.

Let me show you a few tests. I'll be using a test table I had in place from some previous tests for another post. It's a simple 2500 row table with 2 indexes (T2_I3 on the hair column and T2_I6 on the work column). Take a look at what happens when I execute my query 4 times using the 4 different scenarios I noted above.

SQL>set serveroutput off
SQL>variable b1 number
SQL>variable b2 number
SQL>exec :b1 := null;

PL/SQL procedure successfully completed.

SQL>exec :b2 := null ;

PL/SQL procedure successfully completed.

SQL>
SQL>select /*+ gather_plan_statistics */ count(*)
2 from bitmap_t2
3 where hair = NVL(:b1, hair)
4 and work = NVL(:b2, work) ;

COUNT(*)
---------------
2500

1 row selected.

SQL>
SQL>@allstats_last

PLAN_TABLE_OUTPUT
------------------
SQL_ID 16jagac5xq0r4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from bitmap_t2 where hair = NVL(:b1, hair) and work
= NVL(:b2, work)

Plan hash value: 2153464131
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08 | 2504 | 342 |
| 2 | CONCATENATION | | 1 | | 2500 |00:00:00.07 | 2504 | 342 |
|* 3 | FILTER | | 1 | | 2500 |00:00:00.06 | 2504 | 342 |
|* 4 | TABLE ACCESS FULL | BITMAP_T2 | 1 | 357 | 2500 |00:00:00.03 | 2504 | 342 |
|* 5 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | 0 |
|* 6 | TABLE ACCESS BY INDEX ROWID| BITMAP_T2 | 0 | 5 | 0 |00:00:00.01 | 0 | 0 |
|* 7 | INDEX RANGE SCAN | T2_I6 | 0 | 32 | 0 |00:00:00.01 | 0 | 0 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(:B2 IS NULL)
4 - filter(("HAIR"=NVL(:B1,"HAIR") AND "WORK" IS NOT NULL))
5 - filter(:B2 IS NOT NULL)
6 - filter("HAIR"=NVL(:B1,"HAIR"))
7 - access("WORK"=:B2)

SQL>
SQL>exec :b2 := 45 ;

PL/SQL procedure successfully completed.

SQL>
SQL>select /*+ gather_plan_statistics */ count(*)
2 from bitmap_t2
3 where hair = NVL(:b1, hair)
4 and work = NVL(:b2, work) ;

COUNT(*)
---------------
32

1 row selected.

SQL>
SQL>@allstats_last

PLAN_TABLE_OUTPUT
------------------
SQL_ID 16jagac5xq0r4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from bitmap_t2 where hair = NVL(:b1, hair) and work
= NVL(:b2, work)

Plan hash value: 2153464131
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 67 | 4 |
| 2 | CONCATENATION | | 1 | | 32 |00:00:00.01 | 67 | 4 |
|* 3 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | 0 |
|* 4 | TABLE ACCESS FULL | BITMAP_T2 | 0 | 357 | 0 |00:00:00.01 | 0 | 0 |
|* 5 | FILTER | | 1 | | 32 |00:00:00.01 | 67 | 4 |
|* 6 | TABLE ACCESS BY INDEX ROWID| BITMAP_T2 | 1 | 5 | 32 |00:00:00.01 | 67 | 4 |
|* 7 | INDEX RANGE SCAN | T2_I6 | 1 | 32 | 32 |00:00:00.01 | 35 | 0 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(:B2 IS NULL)
4 - filter(("HAIR"=NVL(:B1,"HAIR") AND "WORK" IS NOT NULL))
5 - filter(:B2 IS NOT NULL)
6 - filter("HAIR"=NVL(:B1,"HAIR"))
7 - access("WORK"=:B2)

SQL>
SQL>exec :b1 := 5;

PL/SQL procedure successfully completed.

SQL>exec :b2 := null ;

PL/SQL procedure successfully completed.

SQL>
SQL>select /*+ gather_plan_statistics */ count(*)
2 from bitmap_t2
3 where hair = NVL(:b1, hair)
4 and work = NVL(:b2, work) ;

COUNT(*)
---------------
357

1 row selected.

SQL>
SQL>@allstats_last

PLAN_TABLE_OUTPUT
------------------
SQL_ID 16jagac5xq0r4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from bitmap_t2 where hair = NVL(:b1, hair) and work
= NVL(:b2, work)

Plan hash value: 2153464131
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.04 | 2504 | 342 |
| 2 | CONCATENATION | | 1 | | 357 |00:00:00.04 | 2504 | 342 |
|* 3 | FILTER | | 1 | | 357 |00:00:00.03 | 2504 | 342 |
|* 4 | TABLE ACCESS FULL | BITMAP_T2 | 1 | 357 | 357 |00:00:00.03 | 2504 | 342 |
|* 5 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | 0 |
|* 6 | TABLE ACCESS BY INDEX ROWID| BITMAP_T2 | 0 | 5 | 0 |00:00:00.01 | 0 | 0 |
|* 7 | INDEX RANGE SCAN | T2_I6 | 0 | 32 | 0 |00:00:00.01 | 0 | 0 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(:B2 IS NULL)
4 - filter(("HAIR"=NVL(:B1,"HAIR") AND "WORK" IS NOT NULL))
5 - filter(:B2 IS NOT NULL)
6 - filter("HAIR"=NVL(:B1,"HAIR"))
7 - access("WORK"=:B2)

SQL>
SQL>exec :b1 := 5;

PL/SQL procedure successfully completed.

SQL>exec :b2 := 45 ;

PL/SQL procedure successfully completed.

SQL>
SQL>select /*+ gather_plan_statistics */ count(*)
2 from bitmap_t2
3 where hair = NVL(:b1, hair)
4 and work = NVL(:b2, work) ;

COUNT(*)
---------------
5

1 row selected.

SQL>
SQL>@allstats_last

PLAN_TABLE_OUTPUT
------------------
SQL_ID 16jagac5xq0r4, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from bitmap_t2 where hair = NVL(:b1, hair) and work
= NVL(:b2, work)

Plan hash value: 2153464131
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.02 | 67 | 5 |
| 2 | CONCATENATION | | 1 | | 5 |00:00:00.02 | 67 | 5 |
|* 3 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | 0 |
|* 4 | TABLE ACCESS FULL | BITMAP_T2 | 0 | 357 | 0 |00:00:00.01 | 0 | 0 |
|* 5 | FILTER | | 1 | | 5 |00:00:00.02 | 67 | 5 |
|* 6 | TABLE ACCESS BY INDEX ROWID| BITMAP_T2 | 1 | 5 | 5 |00:00:00.02 | 67 | 5 |
|* 7 | INDEX RANGE SCAN | T2_I6 | 1 | 32 | 32 |00:00:00.01 | 35 | 0 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(:B2 IS NULL)
4 - filter(("HAIR"=NVL(:B1,"HAIR") AND "WORK" IS NOT NULL))
5 - filter(:B2 IS NOT NULL)
6 - filter("HAIR"=NVL(:B1,"HAIR"))
7 - access("WORK"=:B2)


Note how with each different set of bind variable values, the query produces a different result. But the plan remains the same (check the SQL_ID, child_number and plan hash values...they all match). You can also see how one set of the plan operations doesn't execute (note the 0's in the A-Rows, Buffers and Reads columns).

What this tells you is that the optimizer created a plan that would work dynamically. Well, sorta. Basically, the checks (step 3 and step 5) are made to see if :B2 IS NULL. If it is null, then step 4 executes. If it is not null, then steps 6-7 execute. This works fine for when you want the whole table (both bind values are null) or when you want to retrieve rows only where the work column equals some value. But, when you want to retrieve rows where you need to qualify both the work and hair columns or when you want to access rows for a particular hair value, the plan is sub-optimal.

So, OK...you're batting .500. Not bad I suppose. ;) But what if your SLA says you have to bat .900 or better? This plan just won't do. You need each different scenario to produce as optimal a plan as possible. How can you do that?

One way is to write your code to build the WHERE clause on the fly and use EXECUTE IMMEDIATE. The pseudo-code to do this might go something like this:

full_string := 'SELECT (rest of stuff).....';
if :b1 is null and :b2 is null then
where_string := null;
elsif :b1 is not null and :b2 is not null then
where_string := ' WHERE hair = :B1 AND work = :B2';
elsif :b1 is null and :b2 is not null then
where_string := ' WHERE work = :B2';
elsif :b1 is not null and :b2 is null then
where_string := ' WHERE hair = :B1';

full_string := full_string || where_string;
EXECUTE IMMEDIATE...


This way, you have 4 different statements and therefore 4 different plans. Each one (hopefully) optimal.

Another option would be to change your indexing strategy. Perhaps you could create a multi-column index using both columns. But, even with this option, you're not going to get "perfect" results. Depending on which column you decide to lead the index with, one of the index scans would likely have to do a skip scan or a full scan which wouldn't be as effective.

So, while the idea to use a single statement with conditional logic (in this case, NVL) and binds looks appealing, just remember that the plan you end up with may not provide the performance you want/need for all possible scenarios. If you use this approach, be prepared to create indexes properly so as to get the best possible coverage for all scenarios. I think the bottom line is that if you truly want dynamic SQL, then really use dynamic SQL. The attempt to simplify or to have a single statement that covers many different possible needs can cause performance to suffer for a some percentage (perhaps a significant %) of the scenarios.

11 comments:

Narendra said...

Karen,

Nice one. Jonathan Lewis has also explained this scenario in detail http://jonathanlewis.wordpress.com/2007/02/14/conditional-sql-2/

Karen said...

Good to know I"m only 2 years behind Jonathan! At least I know my thoughts/conclusions are supported by another's findings. :)

Bob said...

JL also mentions one variant:

"If you really want to stick to the single statement, then you could simply concatenate (union all)the various bits of SQL, remembering to include the relevant is null and is not null predicates to get complete non-overlapping coverage. But that’s not so easy as nulls are counter-intuitive".

Mike

Kerry Osborne said...

Hi Karen,

Good post. It did make me wonder what got you thinking about this particular issue. I use this technique all the time in little scripts for finding stuff in the data dictionary. Like I've got a script I call obj.sql that looks at dba_objects and the where clause looks something like this:

where owner like nvl('&owner',owner)
and object_name like nvl('&name',object_name)
and object_type like ('&type',object_type)

It's a handy way to write a script where part of the where clause is basically ignored if you just hit return instead of typing something, but it is definitely not the absolute best performing option (as you and Jonathan have pointed out). On the other hand, it's easy to write and easy to understand what it's doing, and for these kinds of scripts, performance is not usually much of an issue.

I have to admit though, I have never seen this technique actually being used in an application. So that's why I was wondering what got you thinking about it in the first place.

Keep up the good work.

Kerry

Karen said...

Kerry,

The reason I posted on this was that I was sent a SQL statement and its associated plan that used this syntax. I've seen it before at a couple of engagements and the queries that used this method all had performance issues. So, it struck me as an interesting thing to post about.

It is handy if you don't care about performance, like in the lookup scripts you mention (I do the same thing myself for a few scripts I use to query the data dictionary). But, in every instance where I've seen it used in an application, it has been a poor performer.

qube said...

Thanks for article, Karen. There is also nice article about bind peaking and histograms: http://www.pythian.com/blogs/867/stabilize-oracle-10gs-bind-peeking-behaviour-by-cutting-histograms

Randolf said...

Karen,

regarding your initial description:

"The intention of this WHERE clause is to allow 4 different scenarios to be covered.
1) If :B1 is null and :B2 is null, then retrieve all rows."

I think it's worth to note that above statement is not entirely true, since in this case not all rows will be retrieved, but only those where "column1 is not null and column2 is not null".

So in case you have NULLs in any the columns these rows will be filtered out.

Carefully checking the predicate information section confirms this.

It's also mentioned by Jonathan in the "Conditional SQL" blog post.

Best Regards,
Randolf Geist

Oracle related stuff blog:
http://oracle-randolf.blogspot.com/

SQLTools++ for Oracle (Open source Oracle GUI for Windows):
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/

Karen said...

Randolf,

Good notice. Yes, if there were null values in either column for a row, the row wouldn't be retrieved. The real case that my example was based on had not null constraints on both involved columns, so that issue wasn't a concern. I didn't even think to mention it, so thanks for the catch!

Tony said...

This is very interesting post. I find this technique is used more often in web applications for the sake of writing one query that serves many purposes. It is a somewhat tough trade-off I think.

In my situation, I have a report that has about 15 different parameters, many of which are multi-select values. What I do now is do true dynamic SQL and build my where clause on the fly, only including those parameters with selected values. One thing that I fear with this approach is that there are some many different combinations available that my shared pool is getting trashed with new parses. Sometimes I default back to this nvl technique to reduce the number of hard parses. I feel that it is a loose loose scenario.

Radoslav Golian said...

How do you bind variables with
EXECUTE IMMEDIATE ... USING..

when you have different count of parameters?

I use application context for this purpose:

...
IF b1 IS NOT NULL THEN
dbms_session.set_context('ctx','b1', b1);
where_string := where_string || ' hair = sys_context(''ctx'',''b1'') ';

END IF;
...

Radoslav Golian said...

select replace(my_previous_post,'different','variable')
from dual;