Predicting Performance Impact after adding a New Column to an existing Index

This blog featured on my previous blog URL (still unaccessible) and was published on 7th September 2008. Thought of reposting this, as a similar performance issue was recently reported by one of my customer, where a query had to be fixed by way of an Index Hint. This blog would also help you understanding the way optimizer_index_cost_adj is to be calculated for optimizing a query performance. This calculation was requested by one of a regular reader of my blogs.

Modifying an existing index to optimize multiple queries is a common practice, as this eliminates the overheads of maintaning muliple indexes, by way of INSERTS / UPDATES / DELETES. This modification of an existing index includes, re-arranging column ordering, adding a new column etc. This blog focuses on an issue that may trigger and impact the performance of your production system due to addition of a new column in an existing index.

The scenario that I am going to discuss in this blog is, “Assuming one day you are at a customer site investigating a performance issue or doing a proactive health check of a production system. During investigation, you come across two queries that are almost similar and use same Index. The only difference between the two queries is an additional Filter Predicate.” The two queries may look like :

select * from emp where sal=:b2;
select * from emp where sal=:b2 and deptno=:b1;

Both the queries use an Index on SAL column, which is a Single Column Index. We all know that the logical I/O’s of a query can be reduced if a query can make best use of an Index and scans the table, only when required. Which means, a composite index on (SAL, DEPTNO) would be efficient and should reduce the Logical I/O’s for the second query, without impacting the first query. Therefore, instead of creating a new composite index, the suggestion is put forward to modify an existing index on SAL and add DEPTNO as a second column. This will ensure that both the queries use the same index. After the alteration of this Index on production, won’t you be stunned if the customer comes back and says that the performance is still not OK, and infact, is worst than it was a day before ? This is because, while the alteration improved the performance of the second query, it impacted the performance of the first with the change in optimizer plan as it calculates the cost of an Altered Index to be costlier than Full Table Scan. Let us walk through a test case to understand the issue and the rationale behind optimizer estimation.

TEST CASE

drop table vivek_test;

exec dbms_random.seed(0);

create table vivek_test as
with test as
(select * from all_objects where rownum between 1 and 10000)
select	trunc((rownum-1)/1000) object_id,
	round(dbms_random.value(1,100),0) bad_clustered,
	a.object_name,
	a.temporary,
	a.created
from	test a,
	test b
where  rownum between 1 and 100000; 

create index vivek_test_idx on vivek_test(object_id);

exec dbms_stats.gather_table_stats(user,'VIVEK_TEST',cascade=>true);

Below are the statistics for the table and its index (in my test database) :

select owner, table_name, num_rows, blocks from dba_tables where table_name='VIVEK_TEST';

OWNER                          TABLE_NAME                       NUM_ROWS     BLOCKS
------------------------------ ------------------------------ ---------- ----------
VIVEK                          VIVEK_TEST                          99889        420

select index_name, num_rows, blevel, leaf_blocks, distinct_keys, clustering_factor
from	dba_indexes
where	table_name='VIVEK_TEST';

INDEX_NAME                       NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- ------------- -----------------
VIVEK_TEST_IDX                     100000          1         195           100               420

Now, I will run two queries (assume these to be application queries) against the table and both of these queries use the Index created on Object_ID. The queries and their Runtime Execution Plans are pasted below :

variable b1 number;
variable b2 number;
exec :b1:=0;
exec :b2:=57;
select /*+ QUERY1 */ * from vivek_test where object_id=:b1;

SQL_ID  99ss8hxuax3q2, child number 0
-------------------------------------
select /*+ QUERY1 */ * from vivek_test where object_id=:b1

Plan hash value: 3696163677

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |       |       |     7 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |   981 | 24525 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | VIVEK_TEST_IDX |   990 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=:B1)

select /*+ QUERY2 */ * from vivek_test where object_id=:b1 and bad_clustered=:b2;

Plan hash value: 3696163677

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |       |       |     7 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |    10 |   250 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | VIVEK_TEST_IDX |   990 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("BAD_CLUSTERED"=:B2)
   2 - access("OBJECT_ID"=:B1)

Based on the queries executed against this table, it is decided to drop the exisiting index on OBJECT_ID and recreate it on OBJECT_ID, BAD_CLUSTERED. This is to optimize the second query (eliminate unwanted logical I/O’s) and with an assumption that this change will not impact the first query, with QUERY1 hint.

drop index vivek_test_idx;
create index vivek_test_idx on vivek_test(object_id, bad_clustered);
exec dbms_stats.gather_table_stats(user,'VIVEK_TEST',cascade=>true);
SQL> select owner, table_name, num_rows, blocks from dba_tables where table_name='VIVEK_TEST';

OWNER                          TABLE_NAME                       NUM_ROWS     BLOCKS
------------------------------ ------------------------------ ---------- ----------
VIVEK                          VIVEK_TEST                         102477        420

Elapsed: 00:00:00.11
SQL> select index_name, num_rows, blevel, leaf_blocks, distinct_keys, clustering_factor
  2  from dba_indexes 
  3  where table_name='VIVEK_TEST';

INDEX_NAME                       NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- ------------- -----------------
VIVEK_TEST_IDX                     100000          1         237          9998             41509

On a production system, these changes are normally done during the downtime and the system is released for the users. The impact is usually visible when the users start executing these queries concurrently. Let us re-execute the queries with the change in the index implemented. We will execute the second query first (with QUERY2 hint) and then the first query and compare the runtime execution plan.

variable b1 number;
variable b2 number;
exec :b1:=0;
exec :b2:=57;

select /*+ QUERY2 */ * from vivek_test where object_id=:b1 and bad_clustered=:b2;

Plan hash value: 3696163677

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |       |       |     6 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |    10 |   250 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | VIVEK_TEST_IDX |    10 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=:B1 AND "BAD_CLUSTERED"=:B2)

select /*+ QUERY1 */ * from vivek_test where object_id=:b1;

Plan hash value: 3275957777

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |       |    97 (100)|          |
|*  1 |  TABLE ACCESS FULL| VIVEK_TEST |  1015 | 25375 |    97   (4)| 00:00:02 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=:B1)

While the cost of the second query, and the logical I/O’s as well, has gone down, the change has impacted the first query, that was working absolutely fine before the change. This can lead to a greater impact when these are run concurrently. To check for the impact, I simulated concurrency by way of spawning 20 sessions, each running the impacted query. With these 20 concurrent sessions, my 2 CPU laptop choked up with 100% utilization and 95% waits on CBC Latches. The code to simulate the concurrency is pasted below with an explanation :

  1. Create a Staging table and populate the object id’s in random order to be queried.
  2. The staging table holds 20000 object id’s with 10 parallel streams (PSTREAMS).
  3. The procedure will be scheduled from dbms_job, with pstreams as an Input parameter (10 Streams).
drop table staging;

create table staging as
select object_id, mod(rownum,10) pstreams  
from vivek_test where rownum between 1 and 20000
order by dbms_random.random;

create or replace procedure process_job(ps in number) as
l_cnt	number;
begin
for i in (select object_id from staging where pstreams=ps)
loop
select count(distinct temporary) into l_cnt from vivek_test
where	object_id=i.object_id;
end loop;
end;
/


variable jobno number;
variable hrs number;
variable mns number;
exec :hrs:=14;
exec :mns:=35;

begin
for i in 1..10
loop
BEGIN
     DBMS_JOB.SUBMIT(:jobno,'process_job('||i||');', 
         trunc(sysdate)+:hrs/24+:mns/(24*60));
     COMMIT;
END;
end loop;
end;
/

What could have triggered the change in plan ?

Before getting into the cause of this change, let us execute the query with an Index Hint. This will help us investigate the issue.

select /*+ INDEX(VIVEK_TEST,VIVEK_TEST_IDX) */ * from vivek_test where object_id=:b1;

Plan hash value: 3696163677

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |       |       |   420 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |  1015 | 25375 |   420   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | VIVEK_TEST_IDX |  1015 |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=:B1)

With an Index Hint, the cost of step TABLE ACCESS BY INDEX ROWID step is 420 as against 97 for the unhinted query (TABLE SCAN FULL). Optimizer, while generating an optimal execution plan, compares cost of each available access path and selects the one with least cost. In this case, the cost of a TABLE SCAN FULL was assumed to be cheaper than the cost of an INDEX SCAN. The Index can be considered costlier due to two reasons : by way of Cost and / or, due to huge number of Logical I/O’s. TABLE ACCESS BY INDEX ROWID is always followed by INDEX RANGE SCAN, and as can be seen from the plan, the cost of an INDEX SCAN is mere 4, while the cost of TABLE ACCESS step is 420. While the former is calculated from BLEVEL and LEAF_BLOCKS, the later is calculated based on CLUSTERING_FACTOR (CF).

Compare the CF of the Index, before and after change. Before the change, CF was 420 (near to the num_blocks of the table), after the change it changed drastically to 41509. This was due to the bad clustering of the additional column bad_clustered and this change was enough to blow up the cost. The cost calculation before and after the change is as under :

BEFORE THE CHANGE
Index Cost = BLEVEL + CEIL(LEAF_BLOCKS * SELECTIVITY)
Table Scan = CEIL(CF * SELECTIVITY)
Total Cost of TABLE ACCESS BY INDEX ROWID = Index Cost + Table Scan

While BLEVEL is considered for Index Cost Calculation, optimizer ignores this if the BLEVEL of an Index is 1 and all the columns of an Index are used as an EQUALITY Predicate in the query. Therefore, in our case too, since the columns are used as an Equality predicate and the blevel is 1, this value will be ignored.

SQL> select ceil(195*1/100), ceil(420*1/100) from dual;

CEIL(195*1/100) CEIL(420*1/100)
--------------- ---------------
              2               5

The cost of the first query, before the change is 7. Now, after the change :

SQL> select 1+ceil(237*1/100), ceil(41509*1/100) from dual;

1+CEIL(237*1/100) CEIL(41509*1/100)
----------------- -----------------
                4               416

With Index Hint, the cost is 420 and this is costlier than 97. There are two solutions for this issue. These are :

  1. Chnage the value for OPTIMIZER_INDEX_COST_ADJ (OICA) and set this globally.
  2. Fix the Query with an Index Hint.

The first change (i.e.OICA), would require application testing as setting this globally can impact other queries. The second change was a feasible solution and could be applied without impacting other queries. Let us visit the formula to calculate the value of OICA.

OPTIMIZER_INDEX_COST_ADJ

This is calculated as COST OF A FTS divided by TABLE ACCESS BY INDEX ROWID. In our case, the cost of FTS is 97 and Table Access by Index Rowid is 420. Therefore 97/420*100 is 23.09. This means, if the value of OICA is set to less than 23, the query should use an Index, without modification.

SQL> alter session set optimizer_index_cost_adj=22;

Session altered.
select /*+ OICA_22 */ * from vivek_test where object_id=:b1

Plan hash value: 3696163677

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |       |       |    92 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |  1015 | 25375 |    92   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | VIVEK_TEST_IDX |  1015 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=:B1)

As mentioned earlier, a better option would be to hint the query. Once the query is modified with an Index Hint, I spawned the Concurrency jobs and could see that gain, with less than 5% waits on CBC latches, as against, more than 95% and 100% CPU Utilization.

How do we predict the impact of this change before implementing it ?

Before making these changes, the impact can be predicated. This prediction should help estimate the cost change, before and after. Following query can be modified and run (for a bigger table, should be run in parallel during less load period). The only change in the query will be the ordering of the columns in an Index. For this, I will recreate the Index on a Single Column and then execute the query to check for the Change in the CF after adding another column.

SQL> drop index vivek_test_idx;

Index dropped.

Elapsed: 00:00:00.07
SQL> create index vivek_test_idx on vivek_test(object_id);

SQL> select  sum(block_change) from (
  2  select  block_fno, object_id, bad_clustered, prev_bfno,        
  3   (case when nvl(prev_bfno,0)!=block_fno then 1 else 0 end) block_change from (
  4  select  block_fno, object_id, bad_clustered, lag(block_fno) over (order by object_id) prev_bfno from (
  5  select    dbms_rowid.rowid_block_number(rowid)||'.'||
  6    dbms_rowid.ROWID_TO_ABSOLUTE_FNO(rowid,'VIVEK','VIVEK_TEST') block_fno, 
  7    object_id, bad_clustered
  8  from  VIVEK_TEST
  9  order by object_id, bad_clustered, block_fno)));

SUM(BLOCK_CHANGE)
-----------------
            41509

The output of this query should approximately match the CF calculated by the Optimizer and should be enough to proactively arrive at the impact due to the change before making the change.

ppi.sql

Advertisements
%d bloggers like this: