Nested Loops

2011-01-01

584

JL Computer Consultancy

Cost Based Oracle: Fundamentals

Chapter 11: Nested Loop Joins

Addenda

A New Sanity Check (10^th Jan 2006)

Errata

Addenda

A New Sanity Check (10^th Jan 2006)

Oracle 10.2 has introduced yet another sanity check for joins. In this case for nested loop joins specifically. The execution plan below came from running scripts view_merge_01.sql (from chapter 1) on a 10.2 system, which (although it’s not relevant to the sanity check) happened to be running with CPU costing enabled.

Note particularly lines 1, 2, and 5 – which are the lines for the nested loop, the first “table”, and the second table.

----------------------------------------------------+-----------------------------------+

----------------------------------------------------+-----------------------------------+

| 0 | SELECT STATEMENT | | | | 8 | |

| 1 | NESTED LOOPS | | 1 | 95 | 8 | 00:00:01 |

| 2 | VIEW | AVG_VAL_VIEW| 32 | 832 | 7 | 00:00:01 |

| 3 | HASH GROUP BY | | 32 | 224 | 7 | 00:00:01 |

| 4 | TABLE ACCESS FULL | T2 | 1024 | 7168 | 6 | 00:00:01 |

| 5 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 69 | 1 | 00:00:01 |

| 6 | INDEX UNIQUE SCAN | T1_PK | 1 | | 0 | |

----------------------------------------------------+-----------------------------------+

According to the standard formula for the cost of nested loop joins, the cost of this join should be 39 (and in 9i and 10.1 would be):

Cost of acquiring data from first table (7 in line two) +

Cardinality of result from first table (32 in line 2) * Cost of single visit to second table (1 in line 5)

39 = 7 + (32 * 1)

So why is Oracle 10.2 reporting a cost of 8?

Think about the total cost of acquiring data from the second table as given by the standard formula. In this case it is 32. But if you check the statistics about the table itself, its size is just one block – so the traditional costing formula, which is supposed to be accounting for physical I/O times is allowing 32 physical read requests to read a single block of data. Clearly, this is not sensible, and in 10.2 the optimizer addresses this issue by changing the formula to cater for this anomaly.

My first assumption was that the formula becomes:

Cost of acquiring data from first table +

smaller of (

cardinality of result from first table * cost of single visit to second table,

number of blocks below high water mark in second table

)

This seems reasonable, and I sent this conclusion back to Joseph Amalraj, who first alerted me to the anomaly in 10.2. He replied with a very simple question, which I can paraphrase as:

How does the optimizer figure out the number of blocks in the table ? In my test case the number of blocks actually used was just one, but the statistics (and the 10053 trace file) had blocks =5.

Good question – I’d forgotten about ASSM (automatic segment space management), and when I repeated the test case in a tablespace using ASSM with a 1MB extent size my one block of data was at block 60 in the extent; so the 10053 said “table blocks = 60” – but the arithmetic seemed to say “and I know I’ll only be visiting one block”.

After further experimentation (with my hack_stats.sql script) I decided that Oracle was probably picking up the clustering_factor for the target index as the limiting factor for the cost of the inner loop, giving the (current) version of the nested loop cost formula for 10.2 as:

Cost of acquiring data from first table +

smaller of (

cardinality of result from first table * cost of single visit to second table,

clustering_factor of index used to access the second table

)

As with so many of the changes that have appeared in newer versions of the optimizer, this is clearly a sensible strategy. Unfortunately, it means that a few people are likely to discover a few problem queries when they upgrade to 10.2. Keep an eye on any queries which join a mixture of large and small tables – it is possible that a query that is currently doing a hash join by hashing a small table (data set) and probing it with a large table (data set) will switch to a nested loop driving from the large table (data set) and using an index into the small table (data set). In theory the change in performance should not be significant if this happens – in practice there may be a massed difference in CPU, latching, and random I/O.

Thanks to Joseph Amalraj for spotting the anomaly and following up when my initial assumption showed a fatal flaw.

Back to Book Index

Top of page.