Thursday, July 10, 2008

MIN/MAX and Index Full Scans

We got a question at work today from a blogger in regard to his blog post entitled MIN and MAX Functions in a Single Query are Disastrous. If you check out his post, you'll see the details of his tests. Basically, he was wondering why the optimizer would choose to do a full table scan instead of using an INDEX FULL SCAN (MIN/MAX) operation for a query that contained both a MIN and a MAX function call on the same indexed column. I thought my answer to his question to us, about why the optimizer makes such a "bad" decision about the execution plan, would be a good topic for my post today.

Normally, if you write a query that uses only one call, either MIN or MAX, the optimizer will use the INDEX FULL SCAN (MIN/MAX) optimization to read only through to the first leaf block (for MIN) and to the last leaf block (for MAX). This is a great optimization that can save hundreds/thousands of block accesses over either a full table scan or even an INDEX FAST FULL SCAN. But, as Momen's tests show, if you execute a query that has both MIN and MAX in it, the optimizer refuses to use this optimization although it would seem to us (the feeble human brain) that the optimizer should know that it could do the operation twice, once to get the first block and once to get the last block, right?

But, the optimizer can't do this. The INDEX FULL SCAN (MIN/MAX) operation will do either an index scan for the first leaf block or an index scan for the last leaf block. It can not do both at the same time. Think about it... The code to handle the operation would have to have some fairly specific logic in it to handle the dual calls.

The question to then ask is why wouldn't Oracle choose to add that logic? Well, since there's actually an easy way to get the result you want using the current operation's code path, I'd say Oracle didn't want to invest development time into writing specific case code when the simple branch of MIN or MAX can work properly as long as the developer knows how to write their SQL to invoke it.

Here's what has to happen.
This query:
select min(empno) min_empno, max(empno) max_empno
from emp;

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 442 (5)| 00:00:06 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | INDEX FAST FULL SCAN| PK_EMP | 917K| 4480K| 442 (5)| 00:00:06 |
---------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
196 recursive calls
0 db block gets
1959 consistent gets
1 physical reads
0 redo size
476 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed

becomes this:
select a.min_empno, b.max_empno
from
(select min(empno) min_empno from emp) a,
(select max(empno) max_empno from emp) b;

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 26 | 6 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 3 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 5 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| PK_EMP | 917K| 4480K| 3 (0)| 00:00:01 |
| 5 | VIEW | | 1 | 13 | 3 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 5 | | |
| 7 | INDEX FULL SCAN (MIN/MAX)| PK_EMP | 917K| 4480K| 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
476 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


The same results could also be achieved by doing a UNION of the two separate statements. Either way, what is required is that the developer must understand how the optimizer utilizes the INDEX FULL SCAN (MIN/MAX) optimization and write their SQL to take full advantage of it.

Perhaps at some point (I haven't tested this in 11g yet to see if the optimizer behaves any differently) the optimizer will have code path to handle both calls simultaneously, but for now, the developer must provide the optimizer with syntax to allow the optimizer to formulate the optimal overall execution plan.

To me, this is just one more really good example why the database should not be "just a black box" to developers. Writing high performance SQL requires understanding how the Oracle optimizer works, not just an ability to put together a statement that gets the correct answer.

Thanks to Momen for his post and his question to Method R on this. It was good food for thought and nice blog fodder! :)

Addendum - July 11, 2008, 8:57am
Thanks to my colleague Ron who did the 11g test on this and he confirmed that the optimizer behavior has not changed. So, even in 11g, the SQL would need to be written to submit the MIN and MAX calls separately in order to get the optimal execution plan.

8 comments:

Mael said...

Excellent! Thanks to you. I'm dealing with this issue for weeks. I will change a bit all my queries.

Brian Tkatch said...

Adding WHERE Rownum = 1 lowers the consistent get metrics, though it still shows an FTS.

Why is that?

SQL> CREATE TABLE a(A UNIQUE, B) AS SELECT Level, RPAD('A', 1000, 'A') FROM Dual CONNECT BY Level <= 10000;

Table created.

SQL> EXEC DBMS_STATS.Gather_Table_Stats(NULL, 'A');

PL/SQL procedure successfully completed.

SQL> SELECT MIN(A), MAX(A) FROM A WHERE RowNum = 1;

MIN(A) MAX(A)
---------- ----------
1 1


Execution Plan
----------------------------------------------------------
Plan hash value: 364713085

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 400 (1)| 00:00:05 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL| A | 10000 | 40000 | 400 (1)| 00:00:05 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(ROWNUM=1)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
468 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> DROP TABLE A;

Table dropped.

Karen said...

Brian,

Including ROWNUM in the WHERE clause allows the optimizer to add the COUNT STOPKEY operation to the plan. What this does is "short circuit" the full table scan so that it only does the minimum number of read calls to get the number of rows specified by the ROWNUM predicate (in your example, just 1 row).

But...did you notice your answer for the MIN and MAX? Both the MIN and the MAX give you the answer of 1. That's because you only read 1 row (due to ROWNUM) and so therefore, the MIN and MAX must be based on that 1 row. The correct answer should be MIN=1 and MAX=10000 (given the data in your test table).

So, while the addition of ROWNUM did short circuit the full table scan, it did so at the cost of giving you the wrong answer.

Laurent Schneider said...

hi karen,
I just discovered a bug with min/max:

SQL>
SQL>
SQL>
SQL>
SQL> select /*+FULL(DEPT)*/min(deptno)keep(dense_rank first order by 1)over()from dept;
MIN(DEPTNO)KEEP(DENSE_RANKFIRSTORDERBY1)OVER()
----------------------------------------------
10
10
10
10


Execution Plan
----------------------------------------------------------
Plan hash value: 2041383057

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 12 | 3 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 4 | 12 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPT | 4 | 12 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------

SQL> select min(deptno)keep(dense_rank first order by 1)over()from dept;
MIN(DEPTNO)KEEP(DENSE_RANKFIRSTORDERBY1)OVER()
----------------------------------------------
10


Execution Plan
----------------------------------------------------------
Plan hash value: 13114323

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 12 | 1 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 4 | 12 | 1 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN (MIN/MAX)| PK_DEPT | 4 | 12 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------


only one row is returned !

Karen said...

Laurent,

I tried one other test that slightly modified your SQL statement to include an additional column as follows:

SQL> select deptno, min(deptno) keep (dense_rank first order by 1) over() from dept;

DEPTNO MIN(DEPTNO)KEEP(DENSE_RANKFIRSTORDERBY1)OVER()
---------- ----------------------------------------------
10 10
20 10
30 10
40 10


Execution Plan
----------------------------------------------------------
Plan hash value: 4041760511

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 12 | 1 (0)| 00:00:01 |
| 1 | WINDOW BUFFER | | 4 | 12 | 1 (0)| 00:00:01 |
| 2 | INDEX FULL SCAN| PK_DEPT | 4 | 12 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1 consistent gets
0 physical reads
0 redo size
563 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
4 rows processed


So, it looks like the bug is that when a MIN function is used in an analytic, the optimizer is treating it as if it were written as a GROUP BY and provides just a single answer.

I suppose it could be argued that this is a "feature" and not a bug, as MIN should return the same answer for every row. Therefore, it's unnecessary to retrieve every row when you don't ask for any additional columns and the answer would be the same. So, why get lots of rows when one would suffice?

Interesting...

Laurent Schneider said...

SCOTT@lsc01> select min(deptno)keep(dense_rank first order by 1)over(),sysdate from dept;
MIN(DEPTNO)KEEP(DENSE_RANKFIRSTORDERBY1)OVER() SYSDATE
---------------------------------------------- ---------
10 11-JUL-08

SCOTT@lsc01> select min(deptno)keep(dense_rank first order by 1)over(),current_timestamp from dept;
MIN(DEPTNO)KEEP(DENSE_RANKFIRSTORDERBY1)OVER() CURRENT_TIMESTAMP
---------------------------------------------- ---------------------------------------------------------
10 11-JUL-08 06.45.29.949000 PM +02:00
10 11-JUL-08 06.45.29.949000 PM +02:00
10 11-JUL-08 06.45.29.949000 PM +02:00
10 11-JUL-08 06.45.29.949000 PM +02:00



Is it a feature?

Brian Tkatch said...

Karen,

Nope, i did not look at the answer. I was too wrapped up in playing with it and i forgot to check.

Thank you for pointing it out. I did not realize the WHERE was handled before the aggregation.

Tony said...

Looks like 11g behaves the same.

tony> select min(empno) min_empno,
max(empno) max_empno
from emp;

MIN_EMPNO MAX_EMPNO
---------- ----------
7369 7934

1 row selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 4146668512

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INDEX FULL SCAN| SYS_C0013122 | 14 | 56 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1 consistent gets
0 physical reads
0 redo size
476 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

tony> select a.min_empno,
b.max_empno
from
(select min(empno) min_empno from emp) a,
(select max(empno) max_empno from emp) b;

MIN_EMPNO MAX_EMPNO
---------- ----------
7369 7934

1 row selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 3646153433

---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 26 | 2 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 1 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 4 | | |
| 4 | INDEX FULL SCAN (MIN/MAX)| SYS_C0013122 | 14 | 56 | 1 (0)| 00:00:01 |
| 5 | VIEW | | 1 | 13 | 1 (0)| 00:00:01 |
| 6 | SORT AGGREGATE | | 1 | 4 | | |
| 7 | INDEX FULL SCAN (MIN/MAX)| SYS_C0013122 | 14 | 56 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
476 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

tony>