First K Rows Optimization : Query Performance issue with ROWNUM predicate

Post 10g Upgrade, have you ever come across of a Performance Issue of a Query with ROWNUM predicate ? Last few weeks, I was busy with two back to back issues, reported by two different customers, of a Query Performance Issue due to ROWNUM predicates. Since this was reported by two different customers, the problem reported were different, but ended up as what seemed to be a First K Rows Optimization Issue. The issue reported were :

  1. Performance of a Query, on UAT, degraded after creating a new Index. This new Index is created as a part of new deployment and is impacting a query already deployed sucessfully on Production.
  2. Performance of a Query, on production, degraded after upgrading to 10.2.0.3 from 9.2.0.8. Further, upgrading to 10.2.0.4 has also not helped as yet and it continues to consume most of the resources.

Prior to Oracle 9i, the only two cost-based optimizer modes available were either ALL_ROWS and FIRST_ROWS. The former was either explicitly specified, by way of initialization parameter or hint, or was triggered when the following two conditions were met: optimizer_mode=CHOOSE and statistics on any of the tables in the query were gathered. FIRST_ROWS optimization was triggered, either by way of parameter setting or Hint.

The Optimization Rules and Calculations of “First K Rows Optimization” is analogous to First_Rows_n, but does not require any specific parameter settings or Hints and is automatically invoked when some of the conditions are met. One such condition is a ROWNUM predicate. If a query block contains a ROWNUM predicate, then First K Rows optimization will be automatically invoked, where the value of K is set to the value specified to a ROWNUM predicate. For example, if a query block contains a predicate ROWNUM=1 or ROWNUM=10, then the value of K, for First K Rows Optimization, will be evaluated to 1 or 10, as a result, First_rows_n optimization will be triggered.

First K Rows Optimization working is similar to First_Rows_n. With this, the optimizer first estimates the number of rows that will be returned by completely analyzing the first Join Order. Once this is done, and the estimated rows to be processed is computed, the entire optimization is redone to find the execution plan that minimizes the resource consumption to process the query. It is the first step of calculatng estimated number of rows that drives whether the final execution plan generated will be optimal or sub-optimal.

ISSUE 1

In this case, the query has been deployed sucessfully on production and is running perfectly fine. On the UAT, a New Index was created for deployment of new module. While the creation of this new index optimized the to-be-deployed module, it impacted another module and this concern was raised by the customer. In this case, the better option was to drop the new index and re-optimize the to-be-deployed module, in such a way, that this new Index is not required.

Query and Analysis

SELECT 'X'   FROM NBFC_PMNT_DTL
WHERE	TXNADVICEID IN
	(SELECT (TXNADVICEID + 0 ) FROM NBFC_TXN_ADVICE_DTL
	WHERE	CASEID = :b1
	AND	STATUS = 'A'
	AND	TXNTYPE IN ( 'A','P'  )
	AND	CHARGEID IN
		(SELECT CHARGEID   FROM NBFC_CHARGES_M  WHERE CHARGECODEID = 2 ))
AND	STATUS IS NULL
AND	ROWNUM = 1

FROM UAT
========
-------------------------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                              | Name                    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
-------------------------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                       |                         |       |       |     4 |           |       |       |
| 1   |  COUNT STOPKEY                         |                         |       |       |       |           |       |       |
| 2   |   FILTER                               |                         |       |       |       |           |       |       |
| 3   |    PARTITION RANGE ALL                 |                         |     1 |     9 |     2 |  00:00:01 | 1     | 21    |
| 4   |     TABLE ACCESS FULL                  | NBFC_PMNT_DTL           |     1 |     9 |     2 |  00:00:01 | 1     | 21    |
| 5   |    NESTED LOOPS                        |                         |     1 |    32 |     2 |  00:00:01 |       |       |
| 6   |     PARTITION RANGE SINGLE             |                         |     1 |    21 |     1 |  00:00:01 | KEY   | KEY   |
| 7   |      TABLE ACCESS BY LOCAL INDEX ROWID | NBFC_TXN_ADVICE_DTL     |     1 |    21 |     1 |  00:00:01 | KEY   | KEY   |
| 8   |       INDEX RANGE SCAN                 | NBFC_TXN_ADVICE_DTL_IDX5|     1 |       |     1 |  00:00:01 | KEY   | KEY   |
| 9   |     TABLE ACCESS BY INDEX ROWID        | NBFC_CHARGES_M          |     1 |    11 |     1 |  00:00:01 |       |       |
| 10  |      INDEX UNIQUE SCAN                 | NBFC_CHARGES_PK         |     1 |       |     1 |  00:00:01 |       |       |
-------------------------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
1 - filter(ROWNUM=1)
2 - filter( IS NOT NULL)
4 - filter("STATUS" IS NULL)
7 - filter(("TXNTYPE"='A' OR "TXNTYPE"='P'))
8 - access("STATUS"='A' AND "CASEID"=:B1)
8 - filter("TXNADVICEID"+0=:B1)
9 - filter("CHARGECODEID"=2)
10 - access("CHARGEID"="CHARGEID")

FROM PRODUCTION
===============
--------------------------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                               | Name                    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
--------------------------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                        |                         |       |       |     7 |           |       |       |
| 1   |  COUNT STOPKEY                          |                         |       |       |       |           |       |       |
| 2   |   TABLE ACCESS BY LOCAL INDEX ROWID     | NBFC_PMNT_DTL           |     1 |     9 |     1 |  00:00:01 |       |       |
| 3   |    NESTED LOOPS                         |                         |     1 |    22 |     7 |  00:00:01 |       |       |
| 4   |     VIEW                                | VW_NSO_1                |     1 |    13 |     4 |  00:00:01 |       |       |
| 5   |      HASH UNIQUE                        |                         |     1 |    31 |       |           |       |       |
| 6   |       TABLE ACCESS BY LOCAL INDEX ROWID | NBFC_TXN_ADVICE_DTL     |     1 |    21 |     3 |  00:00:01 |       |       |
| 7   |        NESTED LOOPS                     |                         |     1 |    31 |     4 |  00:00:01 |       |       |
| 8   |         TABLE ACCESS BY INDEX ROWID     | NBFC_CHARGES_M          |     1 |    10 |     1 |  00:00:01 |       |       |
| 9   |          INDEX RANGE SCAN               | NBFC_CHARGES_M_IDX1     |     1 |       |     1 |  00:00:01 |       |       |
| 10  |         PARTITION RANGE SINGLE          |                         |    39 |       |     1 |  00:00:01 | KEY   | KEY   |
| 11  |          INDEX RANGE SCAN               | NBFC_TXN_ADVICE_DTL_IDX3|    39 |       |     1 |  00:00:01 | KEY   | KEY   |
| 12  |     PARTITION RANGE ITERATOR            |                         |     1 |       |     1 |  00:00:01 | KEY   | KEY   |
| 13  |      INDEX RANGE SCAN                   | NBFC_PMNT_DTL_IDX1      |     1 |       |     1 |  00:00:01 | KEY   | KEY   |
--------------------------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
1 - filter(ROWNUM=1)
2 - filter("STATUS" IS NULL)
6 - filter(("STATUS"='A' AND INTERNAL_FUNCTION("TXNTYPE")))
9 - access("CHARGECODEID"=2)
11 - access("CASEID"=:B1 AND "CHARGEID"="CHARGEID")
13 - access("TXNADVICEID"="$nso_col_1")

NBFC_TXN_ADVICE_DTL_IDX5 index, used on UAT, is a new Index. The query on production uses another index i.e.NBFC_TXN_ADVICE_DTL_IDX3. Though the statistics on production and uat are different, these were enough to get to the root cause of the issue. With ROWNUM=1 predicate, both the queries were optimized using First K Rows optimization, but the estimated number of rows, at the very first step played a major role in generating the final execution plan. On UAT, the New index is on STATUS,CASEID columns, while on production, the index IDX3 is only on CASEID. We all know that First_Row_n optimization means, fetch the Initial N rows as fast as possible. On UAT, since the Index is on two columns, based on the Distinct Keys (#DK), the optimizer estimates only 1 row (#Rows/#DK), while on Production, with IDX3, it estimates 39 Rows. As soon as the New Index (IDX5) on UAT was dropped, the estimated cardinality increased to 9 and this restored the optimized plan similar to production.

PRODUCTION Stats for NBFC_TXN_ADVICE_DTL
========================================
Table Stats::
  Table: NBFC_TXN_ADVICE_DTL  Alias: NBFC_TXN_ADVICE_DTL  (Using composite stats)
    #Rows: 893378705  #Blks:  985102  AvgRowLen:  162.00
Index Stats::
  Index: NBFC_TXN_ADVICE_DTL_IDX3  Col#: 28 3
    USING COMPOSITE STATS
    LVLS: 3  #LB: 3341430  #DK: 23223154  LB/K: 1.00  DB/K: 22.00  CLUF: 531943755.00

UAT Stats for NBFC_TXN_ADVICE_DTL
==================================
Table Stats::
  Table: NBFC_TXN_ADVICE_DTL  Alias: NBFC_TXN_ADVICE_DTL  (Using composite stats)
    #Rows: 20847998  #Blks:  8756  AvgRowLen:  164.00
Index Stats::
  Index: NBFC_TXN_ADVICE_DTL_IDX3  Col#: 28 3
    USING COMPOSITE STATS
    LVLS: 2  #LB: 52180  #DK: 2236889  LB/K: 1.00  DB/K: 5.00  CLUF: 12030805.00
  Index: NBFC_TXN_ADVICE_DTL_IDX5  Col#: 17 28 27 1
    USING COMPOSITE STATS
    LVLS: 2  #LB: 78305  #DK: 16932120  LB/K: 1.00  DB/K: 1.00  CLUF: 11248060.00

ISSUE 2

Fortunately, I could spend more time on this issue and the analysis in this case, I feel, should be of interest to you. This was an Apps seeded query and the performance of this query degraded post 10g Upgrade (10.2.0.3). The issue again was due to ROWNUM=1 predicate, and I could confirm this by comparing the response time and consistent gets of the query after and before removing ROWNUM=1 predicate and the difference was huge. This query is a part of a critical transactions, that use to take 4-5 seconds in 9208, but takes 20-30 minutes after the upgrade. On 10.2.0.3, the query performance improves after setting Optimizer_features_enable (OFE) to 9.2.0.8 or 10.1.0, but this would not be a feasible solution as it required testing of the entire application. The Query and the plan with OFE as default and set to 9.2.0.8 is pasted below :

SQL_ID        SQL_TEXT                                                     BUFFER_GETS EXECUTIONS
------------- ------------------------------------------------------------ ----------- ----------
a15w9fc43c6gr SELECT UOM.UOM_CODE FROM MTL_UNITS_OF_MEASURE_VL UOM WHERE U  2352410716         55
              OM.UNIT_OF_MEASURE = ( SELECT PL.UNIT_MEAS_LOOKUP_CODE FROM
              PO_LINES_ALL PL WHERE (PL.PO_HEADER_ID,PL.PO_LINE_ID) = ( SE
              LECT PD.PO_HEADER_ID, PD.PO_LINE_ID FROM PO_DISTRIBUTIONS_AL
              L PD WHERE PD.PO_DISTRIBUTION_ID IN ( SELECT DISTINCT RT.PO_
              DISTRIBUTION_ID FROM RCV_TRANSACTIONS RT WHERE RT.WIP_ENTITY
              _ID = :B4 AND NVL(RT.WIP_REPETITIVE_SCHEDULE_ID, -1) = NVL(:
              B3 , -1) AND RT.ORGANIZATION_ID = :B2 AND RT.WIP_OPERATION_S
              EQ_NUM = :B1 AND RT.PO_DISTRIBUTION_ID IS NOT NULL ) AND ROW
              NUM = 1 ) )

## Run time BAD Plan
--------------------
Plan hash value: 455173600
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                            |       |       |     8 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID      | MTL_UNITS_OF_MEASURE_TL    |     1 |    16 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN               | MTL_UNITS_OF_MEASURE_TL_U1 |     1 |       |     1   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID    | PO_LINES_ALL               |     1 |    16 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN             | PO_LINES_U1                |     1 |       |     1   (0)| 00:00:01 |
|*  5 |      COUNT STOPKEY                |                            |       |       |            |          |
|   6 |       NESTED LOOPS SEMI           |                            |     2 |    80 |     3   (0)| 00:00:01 |
|   7 |        TABLE ACCESS FULL          | PO_DISTRIBUTIONS_ALL       |  1275K|    19M|     2   (0)| 00:00:01 |
|*  8 |        TABLE ACCESS BY INDEX ROWID| RCV_TRANSACTIONS           |     1 |    24 |     1   (0)| 00:00:01 |
|*  9 |         INDEX RANGE SCAN          | RCV_TRANSACTIONS_N10       |     1 |       |     1   (0)| 00:00:01 |
|* 10 |     COUNT STOPKEY                 |                            |       |       |            |          |
|  11 |      NESTED LOOPS SEMI            |                            |     2 |    80 |     3   (0)| 00:00:01 |
|  12 |       TABLE ACCESS FULL           | PO_DISTRIBUTIONS_ALL       |  1275K|    19M|     2   (0)| 00:00:01 |
|* 13 |       TABLE ACCESS BY INDEX ROWID | RCV_TRANSACTIONS           |     1 |    24 |     1   (0)| 00:00:01 |
|* 14 |        INDEX RANGE SCAN           | RCV_TRANSACTIONS_N10       |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("UNIT_OF_MEASURE"= AND "LANGUAGE"=USERENV('LANG'))
   3 - filter("PL"."PO_HEADER_ID"=)
   4 - access("PL"."PO_LINE_ID"=)
   5 - filter(ROWNUM=1)
   8 - filter(("RT"."PO_DISTRIBUTION_ID" IS NOT NULL AND "RT"."ORGANIZATION_ID"=:B2 AND
              NVL("RT"."WIP_REPETITIVE_SCHEDULE_ID",(-1))=NVL(:B3,(-1)) AND
              "PD"."PO_DISTRIBUTION_ID"="RT"."PO_DISTRIBUTION_ID"))
   9 - access("RT"."WIP_ENTITY_ID"=:B4 AND "RT"."WIP_OPERATION_SEQ_NUM"=:B1)
       filter("RT"."WIP_OPERATION_SEQ_NUM"=:B1)
  10 - filter(ROWNUM=1)
  13 - filter(("RT"."PO_DISTRIBUTION_ID" IS NOT NULL AND "RT"."ORGANIZATION_ID"=:B2 AND
              NVL("RT"."WIP_REPETITIVE_SCHEDULE_ID",(-1))=NVL(:B3,(-1)) AND
              "PD"."PO_DISTRIBUTION_ID"="RT"."PO_DISTRIBUTION_ID"))
  14 - access("RT"."WIP_ENTITY_ID"=:B4 AND "RT"."WIP_OPERATION_SEQ_NUM"=:B1)
       filter("RT"."WIP_OPERATION_SEQ_NUM"=:B1)

## With OFE set to 9208
------------------------
------------------------------------------------------------------------+-----------------------------------+
| Id  | Operation                           | Name                      | Rows  | Bytes | Cost  | Time      |
------------------------------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT                    |                           |       |       |     9 |           |
| 1   |  TABLE ACCESS BY INDEX ROWID        | MTL_UNITS_OF_MEASURE_TL   |     1 |    16 |     2 |  00:00:01 |
| 2   |   INDEX UNIQUE SCAN                 | MTL_UNITS_OF_MEASURE_TL_U1|     1 |       |     2 |  00:00:01 |
| 3   |    TABLE ACCESS BY INDEX ROWID      | PO_LINES_ALL              |     1 |    16 |     2 |  00:00:01 |
| 4   |     INDEX UNIQUE SCAN               | PO_LINES_U1               |     1 |       |     2 |  00:00:01 |
| 5   |      COUNT STOPKEY                  |                           |       |       |       |           |
| 6   |       NESTED LOOPS                  |                           |     1 |    40 |     4 |  00:00:01 |
| 7   |        SORT UNIQUE                  |                           |     1 |    24 |     2 |  00:00:01 |
| 8   |         TABLE ACCESS BY INDEX ROWID | RCV_TRANSACTIONS          |     1 |    24 |     2 |  00:00:01 |
| 9   |          INDEX RANGE SCAN           | RCV_TRANSACTIONS_N10      |     1 |       |     2 |  00:00:01 |
| 10  |        TABLE ACCESS BY INDEX ROWID  | PO_DISTRIBUTIONS_ALL      |     1 |    16 |     2 |  00:00:01 |
| 11  |         INDEX UNIQUE SCAN           | PO_DISTRIBUTIONS_U1       |     1 |       |     2 |  00:00:01 |
| 12  |     COUNT STOPKEY                   |                           |       |       |       |           |
| 13  |      NESTED LOOPS                   |                           |     1 |    40 |     4 |  00:00:01 |
| 14  |       SORT UNIQUE                   |                           |     1 |    24 |     2 |  00:00:01 |
| 15  |        TABLE ACCESS BY INDEX ROWID  | RCV_TRANSACTIONS          |     1 |    24 |     2 |  00:00:01 |
| 16  |         INDEX RANGE SCAN            | RCV_TRANSACTIONS_N10      |     1 |       |     2 |  00:00:01 |
| 17  |       TABLE ACCESS BY INDEX ROWID   | PO_DISTRIBUTIONS_ALL      |     1 |    16 |     2 |  00:00:01 |
| 18  |        INDEX UNIQUE SCAN            | PO_DISTRIBUTIONS_U1       |     1 |       |     2 |  00:00:01 |
------------------------------------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("UNIT_OF_MEASURE"= AND "LANGUAGE"=USERENV('LANG'))
3 - filter("PL"."PO_HEADER_ID"=)
4 - access("PL"."PO_LINE_ID"=)
5 - filter(ROWNUM=1)
8 - filter(("RT"."PO_DISTRIBUTION_ID" IS NOT NULL AND "RT"."ORGANIZATION_ID"=:B2 AND NVL("RT"."WIP_REPETITIVE_SCHEDULE_ID",(-1))=NVL(:B3,(-1))))
9 - access("RT"."WIP_ENTITY_ID"=:B4 AND "RT"."WIP_OPERATION_SEQ_NUM"=:B1)
9 - filter("RT"."WIP_OPERATION_SEQ_NUM"=:B1)
11 - access("PD"."PO_DISTRIBUTION_ID"="RT"."PO_DISTRIBUTION_ID")
12 - filter(ROWNUM=1)
15 - filter(("RT"."PO_DISTRIBUTION_ID" IS NOT NULL AND "RT"."ORGANIZATION_ID"=:B2 AND NVL("RT"."WIP_REPETITIVE_SCHEDULE_ID",(-1))=NVL(:B3,(-1))))
16 - access("RT"."WIP_ENTITY_ID"=:B4 AND "RT"."WIP_OPERATION_SEQ_NUM"=:B1)
16 - filter("RT"."WIP_OPERATION_SEQ_NUM"=:B1)
18 - access("PD"."PO_DISTRIBUTION_ID"="RT"."PO_DISTRIBUTION_ID")

The BAD plan does Full Table Scan of PO_DISTRIBUTIONS_ALL, while the Good Plan, does an Unique Index Scan of this table. The Difference in I/O is 32 Million Logical Reads as against 22 Logical Reads. For my analysis, as I know the issue was with ROWNUM predicate, I removed the outer query block and retained only the relevant portion. The final query, that I worked on is pasted below :

SELECT	PD.PO_HEADER_ID, PD.PO_LINE_ID
FROM	apps.PO_DISTRIBUTIONS_ALL PD
WHERE	PD.PO_DISTRIBUTION_ID IN
	(SELECT	 DISTINCT RT.PO_DISTRIBUTION_ID
	FROM	apps.RCV_TRANSACTIONS RT
	WHERE	RT.WIP_ENTITY_ID = :B4
	AND	NVL(RT.WIP_REPETITIVE_SCHEDULE_ID, -1) = NVL(:B3 , -1)
	AND	RT.ORGANIZATION_ID = :B2
	AND	RT.WIP_OPERATION_SEQ_NUM = :B1
	AND	RT.PO_DISTRIBUTION_ID IS NOT NULL )
AND	ROWNUM = 1 ;

First K Row Optimization is not invoked, if the query block does not contain ROWNUM predicate, or, if you provide ALL_ROWS hint to the query, or if you provide PARALLEL hint into the query. Therefore, to make sure that this is a First K Row Optimization Issue, I executed the queries without ROWNUM predicate, with ALL_ROWS hint and with PARALLEL hint, and all these ran perfectly fine. With this analysis, I was sure that this seems to be an Issue with First K Rows Optimization. Unfortunately, there is no way to disable first k row optimization for this particular query, because of rownum=1. 10053 trace of the query reveals that the computed cardinality of the inner query on RCV_TRANSACTIONS is 1. See the bold portion below :

SINGLE TABLE ACCESS PATH
  -----------------------------------------
  BEGIN Single Table Cardinality Estimation
  -----------------------------------------
  Column (#49): WIP_ENTITY_ID(NUMBER)
    AvgLen: 2.00 NDV: 100442 Nulls: 8235076 Density: 9.9560e-06 Min: 20 Max: 1901958
  Column (#51): WIP_REPETITIVE_SCHEDULE_ID(NUMBER)
    AvgLen: 22.00 NDV: 0 Nulls: 8996812 Density: 0.0000e+00 Min: 0 Max: 0
  Column (#46): ORGANIZATION_ID(NUMBER)
    AvgLen: 4.00 NDV: 72 Nulls: 0 Density: 0.013889 Min: 10 Max: 703
  Column (#52): WIP_OPERATION_SEQ_NUM(NUMBER)
    AvgLen: 2.00 NDV: 169 Nulls: 8235076 Density: 0.0059172 Min: 1 Max: 2055
  Table: RCV_TRANSACTIONS  Alias: RT
    Card: Original: 8996812  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00

The computed cardinality of 1, alongwith the ROWNUM=1 predicate in the outer query block seems to be an issue. To get back to the 9i plan, I had to fake the statistics (NUM_ROWS) of RCV_TRANSACTIONS table, to a value, such that the computed cardinality increases from 1 to 65. Once done, the query performance improved and the transaction response time was restored from 20-30 minutes to 4-5 seconds.

In my opinion, the generation of sub-optimal execution plan with First K Rows Optimization seems to be a BUG. Therefore, for this issue (Issue no.2), the customer has raised an SR and I am currently working closed with the SR Analysts to get a fix of this issue. To reproduce this issue inhouse, I prepared a test case and with default optimizer settings, I could reproduce it on my local database. The test case is pasted below. If you are not able to reproduce the issue, do send a comment and I shall take this up for further investigation.

drop table po_distributions_all ;
drop table RCV_TRANSACTIONS;

exec dbms_random.seed(0);

create table po_distributions_all as
select rownum po_distribution_id, round(dbms_random.value(1,100000),0) po_header_id,
round(dbms_random.value(100001,1000000),0)  po_line_id,
rpad(dbms_random.string('A',20),100,'*') po_name
from	all_objects
where rownum<=10000;

create unique index po_distributions_u1 on po_distributions_all(po_distribution_id);

drop table rcv_transactions;

create table rcv_transactions as
select round(dbms_random.value(1,10000),0) po_distribution_id, round(dbms_random.value(1000,10000000),0) wip_entity_id,
100000 wip_repetitive_schedule_id, 1 wip_line_id, mod(rownum,72) organization_id,
mod(rownum,169) wip_operation_seq_num
from all_objects
where rownum<=10000;

update rcv_transactions set wip_repetitive_schedule_id=null;

create index rcv_transactions_n10 on rcv_transactions(wip_entity_id, wip_line_id, wip_operation_seq_num);

exec dbms_stats.gather_table_stats(user,'rcv_transactions',cascade=>true,method_opt=>'for all columns size 1');

exec dbms_stats.set_table_stats(user,'po_distributions_all',numrows=>1394862,numblks=>42951);
exec dbms_stats.set_column_stats(user,'po_distributions_all',colname=>'po_distribution_id',DISTCNT=>1394862,density=>1/1394862,nullcnt=>0);

exec dbms_stats.set_table_stats(user,'rcv_transactions',numrows=>8996812,numblks=>452191);
exec dbms_stats.set_column_stats(user,'rcv_transactions',colname=>'wip_entity_id',DISTCNT=>100442,density=>1/100442,nullcnt=>8235076);
exec dbms_stats.set_column_stats(user,'rcv_transactions',colname=>'wip_line_id',DISTCNT=>1,density=>1,nullcnt=>round(8996812*95/100,0));
exec dbms_stats.set_column_stats(user,'rcv_transactions',colname=>'wip_operation_seq_num',DISTCNT=>169,density=>1/169,nullcnt=>8235076);
exec dbms_stats.set_column_stats(user,'rcv_transactions',colname=>'po_distribution_id',DISTCNT=>1057573,density=>1/1057573,nullcnt=>6792851);

variable b1 number;
variable b2 number;
variable b3 number;
variable b4 number;
exec :b1:=5;
exec :b2:=5;
exec :b3:=null;
exec :b4:=4449100;

SELECT /*+ VIVEK_ORIGINAL */ PD.PO_HEADER_ID, PD.PO_LINE_ID
FROM PO_DISTRIBUTIONS_ALL PD
WHERE PD.PO_DISTRIBUTION_ID IN
 ( SELECT DISTINCT RT.PO_DISTRIBUTION_ID
 FROM RCV_TRANSACTIONS RT
 WHERE RT.WIP_ENTITY_ID = :B4
 AND NVL(RT.WIP_REPETITIVE_SCHEDULE_ID, -1) = NVL(:B3 , -1)
 AND RT.ORGANIZATION_ID = :B2
 AND RT.WIP_OPERATION_SEQ_NUM = :B1
 AND RT.PO_DISTRIBUTION_ID IS NOT NULL )
AND ROWNUM = 1;

## Fake Statistics
exec dbms_stats.set_table_stats(user,'rcv_transactions',numrows=>1118996812,numblks=>452191);

SELECT /*+ VIVEK_FAKE */ PD.PO_HEADER_ID, PD.PO_LINE_ID
FROM PO_DISTRIBUTIONS_ALL PD
WHERE PD.PO_DISTRIBUTION_ID IN
 ( SELECT DISTINCT RT.PO_DISTRIBUTION_ID
 FROM RCV_TRANSACTIONS RT
 WHERE RT.WIP_ENTITY_ID = :B4
 AND NVL(RT.WIP_REPETITIVE_SCHEDULE_ID, -1) = NVL(:B3 , -1)
 AND RT.ORGANIZATION_ID = :B2
 AND RT.WIP_OPERATION_SEQ_NUM = :B1
 AND RT.PO_DISTRIBUTION_ID IS NOT NULL )
AND ROWNUM = 1;

 

 To Summarize, First K Rows Optimization, invoked automatically with ROWNUM predicate, can generate a sub-optimal execution plan, if the statistics of the inner query computes a cardinality of less number of rows or 1. In this case, you may need to either re-write a query or fake the statistics. There are few bugs reported with this optimization and, if relevant for your case, the fixes provided as a BUG fix can be safely applied. In the case above, the fix is still awaited and I will keep you posted on this.

Advertisements

Row Cache Object Latch Contention : Improper Implementation of Oracle Feature – Day 2 of AIOUG

On Day 2 of AIOUG, I discussed on Library Cache related latches and some causes of the contention on these latches. One such contention is for Row Cache Object latches that protects the Data Dictionary loaded in the Shared Pool. Dictionary Cache is one of the component of a Shared Pool and all the definitions, required to parse a Cursor, is stored in this cache, in the form of a ROW, thus is protected by Row Cache Object Latches.

The Issue

Almost a year back, I was involved in a major performance issue caused due to “Latch: row cache objects” contention. This was a 2 Node RAC setup running 10.2.0.3 database. The issue was normally reported during the peak hour with CPU Utilization above 95%. During this slowness window, there was a major drip in Transaction per Second and this resulted in an increase in pending volume to be entered into the system.

As mentioned earlier, as “Row Cache Object” latches protect Dictionary Cache information, contention on these latches needs investigating the sub-component of the Dictionary Cache and this can be done thru AWR reports. “Dictionary Cache Stats” section in AWR gives the Get Requests, PCT Miss and other required information for each of the sub-component of Dictionary Cache. This section revealed that the GET REQUESTS on DC_USERS was very high. Another alarming statistics was huge version_counts for the application queries. Since, the number of get requests on DC_Users seemed to be abnormally high, the focus was to know for the potential causes of these high numbers and the one related to this application or database. Further investigation drilled down the issue to Virtual Private Database (VPD). The customer here had implemented VPD and this implementation caused the contention on these latches.

Does it mean VPD is an Issue ?

VPD is a very useful feature as it provides an easy and manageable means of implementing row level security, but there are certain implementation portions, that needs to be considered based on the requirement, and it is this piece, if improperly implemented, can cause a major performance issue. In a nutshell, it is not VPD but the way it was being implemented that caused a major issue at the customer site. We will cover this piece of problematic implementation in this blog, with a test case, that will help you to test this on your test system and validate the results.

A brief explanation of VPD

VPD, also called as Row Level Security or Fine Grained Access Control, is a feature, introduced in Oracle 8i, that allows you to define security policies on tables by way of a function that implies a restriction on the rows that a user can SELECT or UPDATE or INSERT or DELETE, based on the privileges defined for that user. For example, assuming a following hierarchy :
Employee S is from South India Region, N from North India Region, W from West India Region, E from East India Region, I from All India Region and finally, APAC for Asia Pacific. With this herierchy, a requirement like :

  1. User S can View and Manipulate all the rows from South India Region, and,
  2. User N can View and Manipulate all the rows from North India Region, and,
  3. User W can View and Manipulate all the rows from West India Region, and,
  4. User E can View and Manipulate all the rows from East India Region, and,
  5. User I can View and Manipulate all the rows from 1 to 4 i.e.South, North, West and East India Region, while,
  6. User A can View and Manipulate all the rows from 1 to 5, including other countries in Asia Pacific Region.

These restrictions can be implemented by using VPD without much of an effort. Application does not require any changes for these implementations, as these restrictions are applied in the background at the runtime. There are three basic components of VPD. These are :
Policy : A declarative command that determines when and how to apply the policy i.e. during Selects, Insertions, Deletion, Updation or combination of any or all of these.
Policy Function : A Pl/SQL function that is called whenever the condition specified in the policy are met.
Predicate : A string that is generated by the policy function, and the applied to the users’ sql statements. This is an additional Filter predicate generated by the policy function and appended to the sql statement executed by the user, at the runtime.

Once these three are defined as a policy, then this policy can be attached to the tables that requires Security to be implemented. While Policy, Policy Function and Predicate are three main building blocks of VPD, there is another component, that is mostly ignored, called as Policy Type. In most of the cases, this is not defined, which means, it is left as default. If not defined, it defaults to a value depending on the database version and therefore, should not be left undefined. Starting Oracle 10g, the policy type defaults to DYNAMIC. While there are other policy types : SHARED Static, CONTEXT-SENSITIVE and SHARED Context Sensitive. The implementation of each of these depend upon the requirement.

Back to the Issue

In this case, it was the implementation of VPD that caused huge Version Count and both of these had a major impact on the performance due to heavy latch contention. Again, it was not the VPD that caused multiple version counts, but the implementation. The Application team, in this case, while implementing VPD, left POLICY_TYPE to default and this means DYNAMIC. DYNAMIC policy_type implies, dynamic builtup of the predicates at runtime, based on the policy function. The issue with Dynamic policy_type is that with each row processed by the query, the policy function is applied, predicate is evaluated and validated whether the row is to be fetched to the user. This means, the query goes through parse-execute-fetch phase for each and every row processed by the query and this significantly increases a Parse Overhead.

Based on the customer requirement, a better option would have been either Context-Sensitive or Shared Context-Sensitive, that internally implements Dynamic predicate generation, at runtime, but goes through the Parse phase only when the Context of the User Session changes and not for each and every row processed. Further, in this customer case, just changing the Policy_type from Dynamic to Shared Context-Sensitive did not help, as there was a major design flaw in the policy function. Lets walk thru a test case and at the end, I will present some facts from the customer system. The test case simulates the implementation at the customer site and the change suggested, thereafter to improve the performance. The change, as depicted in the test case, is in the policy function and the change in policy_type is already changed to Shared Context-Sensitive.

TEST CASE

For this test case to work, I have created some set of statistics table that will store the latch activity. Schema VIVEK contains an application table TEST_VPD. I further create USER_X schema and already have SCOTT and OE, as an application user schemas, ITDBA schema, that contains Security Table. This table defines the security policies for each application users. All of the VPD implementations pre-requisites are created in ITDBA schema.

Once the pre-requisites are created, we will execute certain queries, against TEST_VPD, from VIVEK : access to all the rows, OE Schema : access to rows pertaining to rcc_num=10, SCOTT Schema : access to rows pertaining to rcc_num=20 and USER_X Schema : access to rows pertaining to rcc_num=30. All these access restrictions are defined in EMP_SEC table under ITDBA Schema and the predicates are applied, at run time, by way of Policy Function GF_GLOBAL_POLICY_OPT_RSTRCT_VW, again under ITDBA Schema. Once these are done, we will run certain queries against our Statistics table and some data dictionary views to check for the performance. The policy function, GF_GLOBAL_POLICY_OPT_RSTRCT_VW, will then be changed to improve the performance and the queries against TEST_VPD and measure the performance.

connect vivek/vivek

drop table latch_stats_start;
drop table latch_stats_end;
drop table test_vpd;
drop user user_x cascade;
drop user itdba cascade;
drop public synonym as_test_vpd;
drop context rcc_ctx;

exec dbms_random.seed(0);
create global temporary table latch_stats_start as
select name, gets from v$latch;

create global temporary table latch_stats_end as
select name, gets from v$latch;

create table test_vpd as
select	dbms_random.string('U',5)||lpad(rownum,5,'0') pan,
	dbms_random.string('U',20) name,
	case when mod(rownum,10)=0 then 'F' else 'M' end sex,
	round(dbms_random.value(10,100),0) rcc_num
from	all_objects
where	rownum between 1 and 10000;
create unique index pk_pan on test_vpd(pan);
exec dbms_stats.gather_table_stats(user,'test_vpd',cascade=>true,method_opt=>'for all columns size 1');
create public synonym as_test_vpd for test_vpd;
grant select on as_test_vpd to public;

create user user_x identified by vivek;
grant connect,resource to user_x;

connect sys/vivek as sysdba
create user itdba identified by vivek;
grant dba to itdba;
grant execute on dbms_rls to itdba;

connect itdba/vivek

create context rcc_ctx using ITDBA.set_rcc_ctx;

create procedure set_rcc_ctx(p_attr in varchar2, p_val1 in varchar2, p_val2 in varchar2, p_val3 in varchar2)
is
begin
   dbms_session.set_context('RCC_CTX',p_attr,p_val1, p_val2, p_val3);
end;
/

create table itdba.emp_sec (user_id varchar2(20), rcc_num number);

insert into emp_sec values('SCOTT',20);
insert into emp_sec values('OE',10);
insert into emp_sec values('USER_X',30);

CREATE OR REPLACE function ITDBA.GF_GLOBAL_POLICY_OPT_RSTRCT_VW(OBJECT_SCHEMA VARCHAR2, OBJECT_NAME VARCHAR2) return varchar2 is
    L_SEL_FLG VARCHAR2(3);
    L_RCC_NUM NUMBER;
    L_USER_ID  VARCHAR2(20);
    l_predicate varchar2(30);
    v_Restricted_Policy varchar2(1);
begin
    IF  L_USER_ID IS NULL AND USER IN ('ITDBA','SYS','SYSTEM','VIVEK') THEN
       RETURN NULL;
    ELSE
       IF L_USER_ID IS NULL THEN
 	  L_USER_ID:=USER;
       END IF;
     if l_user_id in ('USER_X','SCOTT','OE') then v_Restricted_Policy:='Y'; end if;
     IF v_Restricted_Policy = 'Y' THEN
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num, null, null);
	l_predicate:='RCC_NUM in '||sys_context('RCC_CTX','RCC_NUM');
	 return l_predicate;
     ELSE
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num,l_rcc_num+100,99);
	l_predicate:='RCC_NUM in '||sys_context('RCC_CTX','RCC_NUM');
	 return l_predicate;
     END IF;
    END IF;
Exception
  When others then
       Raise_Application_Error (-20000,SQLERRM);
END;
/
begin
dbms_rls.add_policy(object_schema=>'VIVEK',
   object_name=>'test_vpd',
   policy_name=>'RCC_NUM_POLICY',
   function_schema=>'ITDBA',
   policy_function=>'GF_GLOBAL_POLICY_OPT_RSTRCT_VW',
   statement_types=>'INSERT,SELECT,UPDATE,DELETE',
   update_check=>true,
   policy_type=>dbms_rls.SHARED_CONTEXT_SENSITIVE);
end;
/

Now, since the pre-requisite is done, we will execute a query against TEST_VPD from VIVEK, OE, SCOTT and USER_X Schemas to check the rows returned by each of these users. This will help us validate and confirm whether the policy function is applied as per our requirement.

connect vivek/vivek
column pan for a20
column name for a30

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;

select /*+ VIVEK */ * from as_test_vpd  where rcc_num=20 and rownum<=3;

select /*+ VIVEK */ * from as_test_vpd  where rcc_num=30 and rownum<=3;

## User OE only allowed to see 10
----------------------------------

connect oe/vivek

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd where rcc_num=20 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd where rcc_num=30 and rownum<=3;

## User SCOTT only allowed to see 20
-------------------------------------
connect scott/tiger

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd  where rcc_num=20 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd where rcc_num=30 and rownum<=3;

## User USER_X only allowed to see 30
---------------------------------------

connect user_x/vivek

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd  where rcc_num=20 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd  where rcc_num=30 and rownum<=3;

Now for the performance. This is the Original Implementation.

connect vivek/vivek
alter system flush shared_pool;
connect scott/tiger
insert into vivek.latch_stats_start select name, gets from v$latch;
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
insert into vivek.latch_stats_end select name, gets from v$latch;
connect oe/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
connect user_x/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3; 
SQL>set lines 132
SQL> set pages 1000
SQL> column sql_text for a40 wrap
SQL>
SQL> select sql_id, sql_text, executions, child_latch, version_count from v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH VERSION_COUNT
------------- ---------------------------------------- ---------- ----------- -------------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          3           2             3
              here rownum<=3 
SQL>select sql_id, sql_text, executions, child_latch, child_address from v$sql where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH CHILD_AD
------------- ---------------------------------------- ---------- ----------- --------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          1           2 69EAE56C
              here rownum<=3

1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          1           2 69065194
              here rownum<=3

1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          1           2 68FEDAE8
              here rownum<=3 
SQL>column predicate for a30
SQL>select sql_id, ADDRESS, PREDICATE from v$vpd_policy where sql_id='1n62sv9g6nw77';

SQL_ID        ADDRESS  PREDICATE
------------- -------- ------------------------------
1n62sv9g6nw77 69EAE56C RCC_NUM in 20
1n62sv9g6nw77 69065194 RCC_NUM in 10
1n62sv9g6nw77 68FEDAE8 RCC_NUM in 30

When the Query was executed from different application users, three in our case, the version_count of the query increased as well. The dynamic performance view V$VPD_POLICY shows the runtime plan, which is Literal in this case and causes a query to be hard parsed. Below mentioned query can be executed to check for the latch statistics.

column name for a30
set lines 200
set pages 1000
select * from (select a.name, b.gets end_gets, a.gets Start_gets, b.gets-a.gets
from	vivek.latch_stats_start a,
	vivek.latch_stats_end b
where	b.name = a.name
order by 4 desc)
where rownum<=10;

MODIFICATION

The code below shows the modification to the function that is applied to the policy. Look for the l_predicate value at the end that is finally returned as an Output to the function.

connect vivek/vivek
CREATE OR REPLACE function ITDBA.GF_GLOBAL_POLICY_OPT_RSTRCT_VW(OBJECT_SCHEMA VARCHAR2, OBJECT_NAME VARCHAR2) return varchar2 is
    L_SEL_FLG VARCHAR2(3);
    L_RCC_NUM NUMBER;
    L_USER_ID  VARCHAR2(20);
    l_predicate varchar2(60);
    v_Restricted_Policy varchar2(1);
begin
    IF  L_USER_ID IS NULL AND USER IN ('ITDBA','SYS','SYSTEM','VIVEK') THEN
       RETURN NULL;
    ELSE
       IF L_USER_ID IS NULL THEN
 	  L_USER_ID:=USER;
       END IF;
     if l_user_id in ('USER_X','SCOTT','OE') then v_Restricted_Policy:='Y'; end if;
     IF v_Restricted_Policy = 'Y' THEN
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num, null, null);
	l_predicate := 'RCC_NUM = sys_context (''RCC_CTX'', ''RCC_NUM'')';
	 return l_predicate;
     ELSE
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num,l_rcc_num+100,99);
       	l_predicate := 'RCC_NUM = sys_context (''RCC_CTX'', ''RCC_NUM'')';
	 return l_predicate;
     END IF;
    END IF;
Exception
  When others then
       Raise_Application_Error (-20000,SQLERRM);
END;
/

Once the change is done, we will execute the query on TEST_VPD from the three application users (USER_X, OE and SCOTT), and compare the latch statistics and version_count.

connect vivek/vivek
alter system flush shared_pool;
connect scott/tiger
insert into vivek.latch_stats_start select name, gets from v$latch;
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
insert into vivek.latch_stats_end select name, gets from v$latch;
connect oe/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
connect user_x/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
connect vivek/vivek
Connected.
SQL> set lines 132
SQL> set pages 1000
SQL> column sql_text for a40 wrap
SQL>
SQL> select sql_id, sql_text, executions, child_latch, version_count from v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH VERSION_COUNT
------------- ---------------------------------------- ---------- ----------- -------------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          3           2             1
              here rownum<=3 
SQL>select sql_id, sql_text, executions, child_latch, child_address from v$sql where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH CHILD_AD
------------- ---------------------------------------- ---------- ----------- --------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          3           2 68BEE27C
              here rownum <=3 
SQL>column predicate for a30
SQL>select sql_id, ADDRESS, PREDICATE from v$vpd_policy where sql_id='1n62sv9g6nw77';

SQL_ID        ADDRESS  PREDICATE
------------- -------- ------------------------------
1n62sv9g6nw77 68BEE27C RCC_NUM = sys_context ('RCC_CT
                       X', 'RCC_NUM')

Improved performance can be clearly seen from the version_count, output of V$VPD_POLICY view and from the latch statistics. Once this was implemented, the customer reported a significant performance improvement, for which the real facts are shown below.



Optimizing Performance Part 2 Matrix


Optimizing Performance Part 2 Top 5 waits

CUSTOMER MATRIX – BEFORE

 

TOP 5 WAITS

 



Optimizing Performance Part 2 Achievement

CUSTOMER MATRIX – AFTER

 

To summarize, Oracle provided features enable us to implement these with great ease. Implementing these require careful study of the requirement and proper implementation, that does not compromise with the performance.

%d bloggers like this: