Index – Ordering of Columns

Post my previous Blog on Consistent Gets for an Index Scan, there were few other queries that I received over the chat. The queries were related to Clustering Factor of an Index and therefore, the recommendation or comment was that the Columns should be ordered to keep the value of a Clustering Factor as low as possible. The other comment was that High Cardinality or Low Cardinality plays a critical role with an exception for Unique or Primary Key Indexes. Again, Clustering Factor was cited as the reason behind this comment. Thought of clarifying this as well. In my opinion, this is again a MYTH.

The myth that I am going to address here is – Indexes and Column Ordering should be designed such that the Clustering Factor is as low as possible. This means, an Index on (B,A) should be preferred than (A,B) if the clustering factor of (A,B) is higher than (B,A).

Assuming, I have application queries, like :

select C, D from TABLE_A where A=:b1;

select C, D from TABLE_A where B=:b1 and A=:b2;

If I go by the above myth, shall I then create an Index on (B,A) ? Will this index be used for Query#1 ? Remember, Index Skip Scan is an Optimizer decision and is chosen if the leading column is a low cardinality column. The definition of low cardinality would be an Optimizer decision based on the costing. If an index on (B,A) is not used by Query#1, then shall I go ahead and create another Index on A ? Indexes improve query performance but are overhead for DML’s and therefore, less the number of Indexes, better the DML performance would be. Further, if a column is badly clustered, whether it is a leading column or a trailing column, it doesn’t matter and we will see this with an example. Therefore, as I mentioned in earlier and in my previous blog, Column Ordering of an Index should be basis your application queries and should not be designed basis any other table or column level statistics.

In this case, I am creating a table VIVEK_TEST with 100K rows. Column BC stands for BAD_CLUSTERED and will dominate the clustering factor. There are other 2 columns of interest – Object_ID which is a non unique column and RNUM_UNQ which is a Unique Column.


exec dbms_random.seed(0);

create table vivek_test as
with test as
(select * from all_objects where rownum between 1 and 10000)
select  trunc((rownum-1)/1000) object_id,
	rownum	rnum_unq,
        round(dbms_random.value(1,100),0) bc,
        a.object_name,
        a.temporary,
        a.created
from    test a,
        test b
where  rownum between 1 and 10000;

exec dbms_stats.gather_table_stats(user,'VIVEK_TEST');

column owner for a30
select owner, num_rows, blocks from dba_tables where table_name='VIVEK_TEST';

OWNER                  NUM_ROWS     BLOCKS
-------------------- ---------- ----------
SCOTT                     10000         62

First, I will create 4 Indexes – 2 of these are Single Column Indexes and 2 are Multi-Column.

create index vivek_test_oid on vivek_test(object_id);
create index vivek_test_oid_bc on vivek_test(object_id, bc);
create index vivek_test_bc on vivek_test(bc);
create index vivek_test_bc_oid on vivek_test(bc,object_id);

column index_name for a30
select index_name, num_rows, blevel, leaf_blocks, distinct_keys, clustering_factor
from    dba_indexes
where   table_name='VIVEK_TEST'
order by 1;

INDEX_NAME                       NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- ------------- -----------------
VIVEK_TEST_BC                       10000          1          20           100              4642 <-- Single Column on BC (Bad_Clustered)
VIVEK_TEST_BC_OID                   10000          1          24          1000              4642 <-- BC is the Leading Column
VIVEK_TEST_OID                      10000          1          20            10                56
VIVEK_TEST_OID_BC                   10000          1          24          1000              4856 <-- BC is the Trailing Column

As seen in the example above, column ordering doesn’t matter when it comes to the Clustering Factor. My choice of Indexing would be :

VIVEK_TEST_BC_OID if most of my queries are on 
	BC=:b1;
	BC =:b1 and object_id=:b2;

VIVEK_TEST_OID_BC if most of my queries are on 
	object_id=:b1;
	BC=:b1 and object_id=:b2;

Now, I will drop these indexes and create new indexes on the combination of BC and RNUM_UNQ. RNUM_UNQ is 100% Unique Column.

create index vivek_test_rnum on vivek_test(rnum_unq);
create index vivek_test_rnum_bc on vivek_test(rnum_unq, bc);
create index vivek_test_bc on vivek_test(bc);
create index vivek_test_bc_rnum on vivek_test(bc, rnum_unq);


column index_name for a30
select index_name, num_rows, blevel, leaf_blocks, distinct_keys, clustering_factor
from    dba_indexes
where   table_name='VIVEK_TEST'
order by 1;

INDEX_NAME                       NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- ------------- -----------------
VIVEK_TEST_BC                       10000          1          20           100              4642 <-- Single Column on BC (Bad_Clustered)
VIVEK_TEST_BC_RNUM                  10000          1          26         10000              4642 <-- Leading Column dominates CF
VIVEK_TEST_RNUM                     10000          1          21         10000                56
VIVEK_TEST_RNUM_BC                  10000          1          26         10000                56


For an Index with a Column, either leading or trailing, with 100% Unique values, the leading column would dictate the calculation of Clustering Factor. This is obvious as for the Clustering factor calculation, the values are sorted on the first_column then on Second_Column. For a better idea of how is it calculated, you may run the following queries :

## CF Calculation for BC, RNUM_UNQ (it matches the value as per DBA_INDEXES)

select  sum(block_change) from (
select  block_fno, bc, RNUM_UNQ, prev_bfno,
        (case when nvl(prev_bfno,0)!=block_fno then 1 else 0 end) block_change from (
        select  block_fno, bc, RNUM_UNQ, lag(block_fno) over (order by bc, rnum_unq) prev_bfno from (
                select  dbms_rowid.rowid_block_number(rowid)||'.'||
                        dbms_rowid.ROWID_TO_ABSOLUTE_FNO(rowid,'&Schema','&table_name') block_fno,
                        bc, RNUM_UNQ
                from  VIVEK_TEST
                order by bc, RNUM_UNQ, block_fno)
        )
);

SUM(BLOCK_CHANGE)
-----------------
             4642


## CF Calculation for RNUM_UNQ, BC (it matches the value as per DBA_INDEXES)

select  sum(block_change) from (
select  block_fno, RNUM_UNQ, bc, prev_bfno,
        (case when nvl(prev_bfno,0)!=block_fno then 1 else 0 end) block_change from (
        select  block_fno, RNUM_UNQ, bc, lag(block_fno) over (order by rnum_unq, bc) prev_bfno from (
                select  dbms_rowid.rowid_block_number(rowid)||'.'||
                        dbms_rowid.ROWID_TO_ABSOLUTE_FNO(rowid,'&Schema','&table_name') block_fno,
                        bc, RNUM_UNQ
                from  VIVEK_TEST
                order by RNUM_UNQ, bc, block_fno)
        )
);

SUM(BLOCK_CHANGE)
-----------------
               56

This query can be further modified to get the calculation. Remove the SUM(BLOCK_CHANGE) SELECT from the query and you could generate the output with the calculation. For every Index Entry, if the table block is changed, the BLOCK_CHANGE value is set to 1. This query will help you understand the calculation. This holds true for a combination of Non-Unique columns as well. It is not that for column involving Unique Column, the calculation is different and this will be clear if you deep dive into the queries pasted above.

During my session, I also mentioned a FLAW in Clustering Factor. It is that it doesn’t take into consideration the caching effect and to address this, Oracle introduced TABLE_CACHED_BLOCKS preference. I am against setting manually setting clustering_factor to a lower value as it will be overwritten by the next statistics gathering process. Also, I am STRONGLY against CTAS using ORDER BY to improve the clustering_factor. By using CTAS, you can improve the clustering_factor of one index, but it will impact this for other indexes. The best way is to use the TABLE_CACHED_BLOCKS preference as depicted below.

I will use the queries pasted above to come out with the actual clustering_factor and will then set the TABLE_CACHED_BLOCKS preference. For demonstration, I will work on an Index on BC, RNUM_UNQ. This process can be used for any of the indexes.

## I will create a temporary table to hold the clustering_factor calculation


create table cluf_factor as
select bc, blkno,
        lag(blkno,1,blkno) over(order by bc) prev_blkno,
        case when blkno!=lag(blkno,1,blkno) over(order by bc) or rownum=1
           then 1 else null end cluf_ft from
(select bc, rnum_unq, dbms_rowid.rowid_block_number(rowid) blkno
from    VIVEK_TEST
where   BC is not null
order by bc);

compute sum of cnt on report
break on report
select blkno, count(*) cnt from cluf_factor group by blkno order by 1;

     BLKNO        CNT
---------- ----------
    255883        188
    255884        185
    255885        185
    255886        185
    255887        185
    255936        182
    255937        180
    255938        180
    255939        180
    255940        180
    255941        180
    255942        180
    255943        180
    255945        180
    255946        180
    255947        180
    255948        180
    255949        180
    255950        180
    255951        180
    258896        180
    258897        180
    258898        180
    258899        180
    258900        180
    258901        180
    258902        180
    258903        180
    258905        180
    258906        180
    258907        180
    258908        180
    258909        180
    258910        180
    258911        180
    260480        180
    260481        180
    260482        180
    260483        180
    260484        180
    260485        180
    260486        180
    260487        180
    260489        180
    260490        180
    260491        180
    260492        180
    260493        180
    260494        180
    260495        180
    260496        180
    260497        180
    260498        180
    260499        180
    260500        180
    260501         70
           ----------
sum             10000

56 rows selected.

As per the output, each of the block was touched around 180 times. Each of these touch were considered as it these were read from the Disk. However, Oracle purely works in Memory (Buffer_Cache, Shared_Pool etc) and once the block is read from disk, it is cached in the Buffer_cache thus avoiding the disk i/o. This flaw causes unwanted high clustering_factor and therefore, is addressed by way of TABLE_CACHED_BLOCKS.

TABLE_CACHED_BLOCKS value defaults to 1. Increasing this value (to, say 250) means, clustering_factor for an Index will not be incremented, if the table block being referenced by the current Index entry has been referenced by any of the previous 250 Index Entries. In our case, each of the table blocks were referenced by previous 180 index entries. Therefore, I will change this value to 180.

select index_name, leaf_blocks, clustering_factor from dba_indexes
where table_name = 'VIVEK_TEST';

INDEX_NAME                     LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ----------- -----------------
VIVEK_TEST_RNUM                         21                56
VIVEK_TEST_RNUM_BC                      26                56
VIVEK_TEST_BC                           20              4642
VIVEK_TEST_BC_RNUM                      26              4642

select dbms_stats.get_prefs(pname=>'TABLE_CACHED_BLOCKS',ownname=>'SCOTT',tabname=>'VIVEK_TEST') preference from dual;

PREFERENCE
-----------------------------------------------------------
1

## Setting TABLE_CACHED_BLOCKS to 180
exec dbms_stats.set_table_prefs(ownname=>'SCOTT',tabname=>'VIVEK_TEST',pname=>'TABLE_CACHED_BLOCKS',PVALUE=>180);

select dbms_stats.get_prefs(pname=>'TABLE_CACHED_BLOCKS',ownname=>'SCOTT',tabname=>'VIVEK_TEST') preference from dual;

PREFERENCE
------------------------------------------------------------
180

exec dbms_stats.gather_table_stats(user,'VIVEK_TEST',method_opt=>'for all columns size 1', no_invalidate=>false);

select index_name, leaf_blocks, clustering_factor from dba_indexes
where table_name = 'VIVEK_TEST';

INDEX_NAME                     LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ----------- -----------------
VIVEK_TEST_RNUM                         21                56
VIVEK_TEST_RNUM_BC                      26                56
VIVEK_TEST_BC                           20                56
VIVEK_TEST_BC_RNUM                      26                56

Clustering Factor is calculated for Unique Indexes as well and is critical for costing. The only time when clustering_factos is not considered or not relevant for Unique or Primary Keys is when querying for all the columns part of the Unique or Primary Key i.e. INDEX UNIQUE SCAN. However, clustering_factor will play a critical role when accessing the Unique or Primary Key using INDEX RANGE SCAN.

With this blog, I tried addressing following Myth’s –

- 	Indexes and Column Ordering should be designed such that the Clustering Factor is as low as possible. 
- 	Clustering_Factor is irrelevant for Unique and Primary Key Indexes.
-	High Cardinality or Low Cardinality play important roles with an exception for Unique & Primary Key.

Be it a Unique Index / Primary Key Index or any other Index, column ordering has to be basis your application queries.

One interesting fact on TABLE_CACHED_BLOCKS : While testing this, I actually created a table with 100k rows. The number of blocks in the table were around 469 and the table block were referenced by around 230 previous index entries. Therefore, I modified this preference to 230 and regathered the statistics. It didn’t work. The same table creation script with 10k rows, everything worked as expected. I am investigating this and will update the blog, once I have something to share.

Advertisements

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s