Index Efficiency 2

2011-01-01

493

JL Computer Consultancy

Index Efficiency (again)

Sep 2004

How much work does it take to decide if an index is a lot bigger than it ought to be? For many people the answer will be “not a lot – you just run validate index, or the equivalent analyze command and look at index_stats for the answer”.

That’s good enough for some systems – some systems don’t have many indexes, or have plenty of down-time every night or weekend and don’t mind that the validate action locks the underlying table while the walk through the index is running. On bigger, busier systems you may wish to be more selective and less aggressive.

So, if you do have a very large, busy, system with a few important indexes that you suspect might be candidates for re-organising, how do you decide whether the benefit is likely to be worth the cost – and how do you make that decision without crippling the system to find out?

Here’s one example of how you can get a rough idea of how big an index “ought” to be without doing nasty things to the system. It’s a technique that I’ve been using (on paper, at least) since the early days of Oracle Version 6 when Oracle introduced the current structure of their B-tree indexes. It’s not something that I’ve had to do very often, so I’ve never got around to writing a script to do it – especially since there are so many little variations on a theme. But since I was asked about this topic at a recent presentation, here’s an outline.

Warning – the purpose of this script is to tell you roughly how big index would be if it were a bland, boring index that had grown over time as data has arrived, been updated, and been deleted in random fashion. There are classes of indexes – dictated by business usage – that do not meet that behaviour pattern. This script is not for the amateur “tuning-twiddler”, it is for the thoughtful professional who understands the systems they have to care for. The notes at the start of script list (I hope comprehensively) the limitations that apply. (See also the ideas and code for another index analysis mechanism).

Updated 16^th May 2005:

Someone raised a question about the second query which reports lf_actual, and adjusted_70. This made me realize that I needed to explain why that query was there – which I have done in the comments in the script.

I also re-ran the script against 10.1.0.4 and 9.2.0.6, and found that the final query against user_indexes reported the wrong values for the number of leaf blocks. For some unknown reason, the dbms_stats call had apparently not updated the statistics in the data dictionary – although a block dump showed that the index had been rebuilt as required.

One last point – the “analyze compute statistics” command counts the number of leaf blocks in the index structure; the dbms_stats.gather() procedures count the number of index leaf blocks currently holding rows. In special cases, these numbers may be very different from each other. Some indexes may have lots of empty leaf blocks waiting on the freelist to be re-used. One mechanism counts them as leaf blocks, the other does not.

Updated 25^th April 2009:

Whilst I’ve often done the type of calculation suggested the script below, it’s only been against one or two critical indexes, and I’ve done the entire approximation in my head. Consequently I’ve never tested the script to against a large batch of indexes – until a client recently asked me to check every single script in their system. So I tool the basic script, switched it to the dba_* views, used dba_tab_cols instead of dba_tab_columns (thus addressing the “function-based indexes” issue, and wrapped the whole thing in a pl/sql block – and discovered two errors in the script that I hadn’t previously noticed.

First – The avg_col_len allows for the number of nulls in a column, so the script doesn’t need the (num_rows – num_nulls) factor. (As a side effect, this means that the rounding of avg_col_len when multiplied up by num_rows can introduce a very significant error in the estimate).

Secondly – if a column in an index is defined as descending it does take one extra byte of storage per row in the index, but the column name that appears in user_ind_columns is not the original column name, it is a name like sys_nc000013$, which will appear in user_tab_cols but not in user_tab_columns; and the value for avg_col_len in user_tab_cols allows for the one extra byte. So the code, as it stands, loses descending columns, and when corrected to capture descending columns doesn’t need the desc_ind calculation.

I hope to publish a corrected and more helpful version of the code (including the full pl/sql wrapper) on my blog some time fairly soon.

Updated 28^th Feb 2010:

The corrected version of the code including the pl/sql wrapper is now on my blog, along with a few supporting notes.

Back to Index of Topics

rem

rem Script: index_estimate.sql

rem Author: Jonathan Lewis

rem Dated: Aug 2004

rem Purpose: How big should an index be.

rem

rem Notes:

rem Last tested 10.1.0.4 with 8K blocksize

rem Last tested 9.2.0.6 with 8K blocksize

rem Last tested 8.1.7.4 with 8K blocksize

rem

rem A quick and dirty script to work out roughly

rem how big a simple b-tree index should be.

rem

rem Based on a manual method I first used with v6 when

rem I found the odd index that looked suspiciously large

rem

rem The concept is simple -

rem Number of entries in index * entry overhead plus

rem data content of index entries

rem Assume 100 bytes per block overhead

rem Assume 70% packing (effectively pctfree 30)

rem Allow 1% for branch blocks

rem

rem It's not accurate, but it doesn't need to be if

rem all you want to know is whether the index about

rem the right size or way off.

rem

rem Room for error:

rem Can't cope with rounding errors on avg_col_len

rem Can't cope with generic function-based indexes

rem Doesn't allow for compressed indexes

rem Doesn't allow for large ITLs

rem Doesn't allow for 'long' columns (>127 bytes)

rem Doesn't allow for side-effects of null columns

rem

rem Needs a recent stats collection on the table and index

rem If you use analyze, then add one to avg_col_len

rem If you use dbms_stats, don't.

rem

rem In this example, the block size is hard coded at 8K

rem

rem If you want to modify this code to produce a version that

rem handles partitioning, the rowid for a global index or

rem globally partitioned index is 10 bytes, not 6.

rem

rem There are a few other considerations for dealing with IOTs

rem and secondary indexes on IOTs.

rem

drop table t1;

create table t1

nologging -- adjust as necessary

pctfree 10 -- adjust as necessary

select

owner,

object_type,

object_name,

object_id,

last_ddl_time

from

all_objects

where

rownum <= 10000

;

create index t1_i1 on t1(owner, object_type, object_name);

begin

dbms_stats.gather_table_stats(

user,

't1',

cascade => true,

estimate_percent => null,

method_opt => 'for all columns size 1'

);

end;

rem

rem To start with, we want:

rem index_entries (user_indexes.num_rows) * (

rem rowid entry (6 or 7 depending on uniqueness)

rem +

rem 4 (rowindex entry + 'row' overhead)

rem )

rem +

rem sum((avg_col_len) * (user_tables.num_rows - num_nulls))

rem

rem Note: for descending columns, add one per column

rem Note: if you use ANALYZE, you need avg_col_len + 1

rem Note: if applied to global (partitioned) indexes, the rowid is 10 bytes

rem

prompt

prompt From a purely arithmetic approach, this is

prompt an estimate of how big the index is likely

prompt to be if rebuilt with pctfree 30.

prompt

select

round(

100/70 * -- assumed packing efficiency

1.01 * -- allow for branch blocks

(

ind.num_rows * (6 + uniq_ind + 4) + -- fixed row costs

sum(

(avg_col_len + desc_ind) *

(tab.num_rows - num_nulls)

) -- column data bytes

) / (8192 - 100) -- block size - block overhead

) index_block_estimate_70

from (

select /*+ no_merge */

num_rows,

decode(uniqueness,'UNIQUE',0,1) uniq_ind

from user_indexes

where index_name = 'T1_I1'

) ind,

(

select /*+ no_merge */

num_rows

from user_tables

where table_name = 'T1'

) tab,

(

select /*+ no_merge */

column_name,

decode(descend,'ASC',0,1) desc_ind

from user_ind_columns

where index_name = 'T1_I1'

) ic,

(

select /*+ no_merge */

column_name,

avg_col_len,

num_nulls

from user_tab_columns

where table_name = 'T1'

) tc

where

tc.column_name = ic.column_name

group by

ind.num_rows,

ind.uniq_ind

;

rem

rem We know that we have just built this index at

rem 90% efficiency (pctfree 10), so let’s get an

rem estimate of the probable size it would be at

rem 70% efficiency (pctfree 30) using a quick and

rem dirty query. note – on a production system,

rem the adjusted_70 value from this query will not

rem mean anything.

rem

prompt

prompt Comparison figure – known leaf_blocks

prompt at a known 90%, scaled to see what the

prompt index would look like at 70%

prompt

select

leaf_blocks lf_actual,

round(leaf_blocks * 90/70) adjusted_70

from user_indexes

where index_name = 'T1_I1'

;

alter index t1_i1 rebuild pctfree 30;

begin

dbms_stats.gather_table_stats(

user,

't1',

cascade => true,

estimate_percent => null,

method_opt => 'for all columns size 1'

);

end;

prompt

prompt Leaf blocks when rebuilt at 70% packing

prompt

select

leaf_blocks

from user_indexes

where index_name = 'T1_I1'

;

Back to Index of Topics

oracle

最后修改时间：2020-04-16 14:51:56

Index Efficiency 2

JL Computer Consultancy

Index Efficiency (again)

Sep 2004

Updated 16^th May 2005:

Updated 25^th April 2009:

Updated 28^th Feb 2010:

评论

相关阅读

Index Efficiency 2

JL Computer Consultancy

Index Efficiency (again)

Sep 2004

Updated 16th May 2005:

Updated 25th April 2009:

Updated 28th Feb 2010:

评论

相关阅读

Updated 16^th May 2005:

Updated 25^th April 2009:

Updated 28^th Feb 2010: