JL Computer Consultancy
Using in-line views for speed. |
March 1999 |
Since Version 7.3 (or possibly a bit earlier) Oracle enhanced its SQL parser to interpret in-line views in many of the places where previously only a table, or real view, could be used. For example we can write:-
select
list of t1 columns, list of v1 columns
from
tableA t1,
(complex select statement) v1
where
t1.colX = v1.colz
;
This has several benefits, the most obvious being that the number of static views that might be needed in an application for 'once-only' reports can be reduced significantly. A less obvious benefit is that such views can be used to make even fairly simple SQL statements run much more efficiently. In particular there are classes of query that result in large amounts of sorting which can improve dramatically when rewritten with an in-line view.
Case Study:
Consider a retail organisation which sells lots of different products from lots of different locations. Different stores stock different ranges of items, and on any one day any given store may have some products in stock that achieve no sales. A data warehouse is built which records for each store/product/date a record of total sales details for that combination. The table might look something like:
(product, store, sales_date, quantity_sold, input_cost, sales_value, discounted_value)
A sample query might be: "For a given date, for all products, report sales value, number of stores involved, and average sales per store"; this could be satisfied by a query of the form:
select
product,
count(distinct store),
sum(sales_value),
sum(sales_value)/count(distinct store)
from
sales_summary ss
where
ss.sales_date = to_date('12-Jan-1999','dd-mon-yyyy')
group by
product
;
The performance problem with this extremely simple and obvious query is the count(distinct) - even though the human eye can spot the repeated occurence of identical expressions, the parser cannot. It is easy to demonstrate (using event 10032, which will be described in a future document) that Oracle performs a sort to evaluate the group by (aggregate), but also performs a further sort for EVERY occurrence of the count(distinct).
Strategy:
The sample query can be converted to use an in-line view, and improve CPU costs quite dramatically, as follows:
select
product, store_ct,
sales_value,
sales_value/store_ct
from
(
select
product,
count(distinct store) store_ct,
sum(sales_value) sales_value
from
sales_summary ss
where
ss.sales_date = to_date('12-Jan-1999','dd-mon-yyyy')
group by
product
)
;
Results:
On a test case of the above, with a target of only 90,000 rows for the given date, the CPU cost of the query dropped from 16 CPU seconds for the standard approach to 11 CPU seconds for the in-line view approach.
Triggering the trace to dump the Sort Statistics it was easy to show that the difference in CPU costs was down to Oracle performing 3 sort runs and sorting 270,000 rows in the first case, but only performing 2 sort runs for 180,000 in the second case.
Imagine the difference if the report had covered a much larger data set, and had to produce all the different sales figures, and also had to include a decode() of the store_count to avoid any risk of 'divide by zero' errors. My record to date using this strategy cut one query from 7 minutes to 23 seconds, reducing several million sorted rows to 300,000 sorted row; unfortunately the offending code was generated in real-time by a front-end query tool and the fix could not be imposed backwards into the tool.