Wednesday, September 9, 2009

Parallel Query question

I received a question in the comment section on an older post from Arun that I wanted to address directly. Here's his question:

I have one query regarding parallel execution. I have seen one of your post related to parallel execution of queries involving unions.

My question is: If you have query something like this

select appln_jrnl_id, sum(payment_count)
from fihub.tbl_voucher_acct_line d
where appln_jrnl_id = 'AP_PAYMENT'
and fiscal_year_num = 2008
and acct_period_cd = 5
group by appln_jrnl_id
union
select 'AP_PAYMENT', sum(payment_count) from fihub.tbl_voucher_acct_line d
where appln_jrnl_id = 'AP_PAYMENT'
and fiscal_year_num = 2008
and acct_period_cd = 5;

How you will make this query to work in parallel mode? Do you prefer me to include parallel hint in that Query? If so, union queries may have different tables lists in the from clauses right?

Please throw some light on this!!

-Arun

Well, first let me point you to Doug Burns site for access to great paper he wrote on tuning parallel execution. There's so much excellent info in that paper that I'll defer the in-depth coverage of the topic to him rather than try to review it here.

However, I'll add a couple of things for you to think about.

First, why do you want to force the query to execute in parallel? If the optimizer isn't already choosing parallel operations for the execution plan, it must think it is unnecessary. Have you executed the query in both a serial and parallel manner and compared the results to prove which is better? From your question about using a hint to get parallel behavior, I'm thinking that you likely have not been able to get the query to run in parallel yet. So, if the optimizer doesn't choose to use parallel execution on it's own and you want to test the parallel version, the answer is yes, you can use a hint to attempt to force the optimizer towards parallel execution. At least as of v9.2 (and perhaps prior, but I'm not sure), Oracle supports parallel execution of UNION queries (and bunches of other stuff). So, as long as the proper instance parameters are set to allow parallel execution to occur, the optimizer should either choose a parallel plan (if it's "the best") or you can use the PARALLEL hint to force it.

Second, I'm having a hard time understanding your query. The way it reads to me is that both sides of the UNION are giving you the same answer. Therefore, the final answer to the query would be 1 row like: AP_PAYMENT 12345. The answer to each query will be the same. Since the query is using UNION, the duplicate will be removed and you'll end up with just the one row. So, you could remove the UNION and just execute

select 'AP_PAYMENT', sum(payment_count)
from fihub.tbl_voucher_acct_line d
where appln_jrnl_id = 'AP_PAYMENT'
and fiscal_year_num = 2008
and acct_period_cd = 5;

Unless you've got a boatload of data that matches your conditions and due to that the query runs for several seconds or minutes or more, it looks to me like the query should not need parallel execution. If you have a decent index in place, say on appln_jrnl_id, fiscal_year_num, acct_period_cd (and you could even throw in payment_count to keep you from having to do any table accesses at all), your query should execute quite well.

But, as I see it, if you want to test this particular query for how it would perform using parallel execution, rewrite it as I discussed above and then, if needed, add a PARALLEL hint. And do make sure to read through Doug's paper and also check out the links he provides at the end to other great resources on the topic.

Hope that helps!

2 comments:

Nigel said...

Karen

Looks like Arun's SQL may have intentionally included both summary and detail (one query block has GROUP BY, one doesn't). Second block has SUM(payment_count) but no GROUP BY - is that a transcription error?

Regards Nigel

Karen said...

That's what I thought at first, but since the query is asking for only WHERE appln_jrnl_id = 'AP_PAYMENT', the answer for both SELECT statements is the same and the GROUP BY really isn't necessary at all if you hard code the AP_PAYMENT value.

If it was meant to show detail and totals, the query was formulated incorrectly. In that case, it could be written to use GROUP BY ROLLUP or another analytic syntax that could provide the data that way.

Maybe Arun will see this and chime in and clear it up!