12c Enhancement in Authorization Check Failed issue…

I am very disappointed to miss the OTN Yathra User Group in Bangalore due to bad health. Not still recovered. Hope to present in Bangalore in near future.

One of my slide on “Real Life Issues” demonstrated the library cache mutex issue due to multiple child cursors. I wrote about an issue in one of my previous BLOG. The demonstration in the blog was in Oracle 11g. While preparing for the User Group Event, I re-worked on this demonstration in 12c and was surprised to see a critical enhancement. In 12c, with 300 Schemas, there were only 300 Child Cursors (1 for each schema) as against 8000+ in 11g or below. This means, if the customer upgrades to 12c, the issue due to multiple child cursors would surely get resolved. I assume, this improvement is due to the multitenancy feature as multiple pdb’s can have tables with same name and therefore, similar queries across pdb’s.

Authorization Check Failed ! Multiple Child Cursors…..

Recently, got an oppurtunity to work on a Mutex related issue causing database to halt. While the issue started from April 2013, the frequency of this increased since September 2013. This is a Banking Customer and every month, they add few Branches into the system. With the Branches, the concurrency also goes up. This increase in concurrency could be one of the reason of frequent Library Cache Mutex Issue.

In this case, the issue was found to be due to high number of Child Cursors. The top 4 queries, based on number of child cursors, had around 30k child cursors. From the v$sql_shared_cursor, the reason for these many childs was AUTH_CHECK_MISMATCH, which stands for “Authorization Check Mismatch”. Further investigation revealed this to be an Implementation issue.

The Bank has around 580 Branches. For Scalability purpose, the Application Team has implemented 290 Branches on One Database and other 280 Branches on another Database. These are named as BDB1 and BDB2. For each of the Branches, they have individual schema. Therefore, in BDB1 and BDB2 they have 290 and 280 schemas respectively. Each Schema has it’s own table, procedure, functions etc…The name of the tables, their columns etc are same as well across each of these schema’s. For example, each of the schema has a table USER_SESSION_LOGIN_DTL with same number and names of the columns. They only differ in data stored within each of these tables. Further, each schema has around 800+ tables. Therefore, a query executed against Branch 1 is executed against Branch 2, due to the textual similarity, they match in HASH Values and therefore share a common parent. The logic at the application level is that if a user connects to the application, the Branch Name is returned based on which, he or she connects to the schema (based on the branch name) and executes the application level business trasaction. Another user from another branch will open the same module but will connect to different schema. Therefore, while the queries are textually same for each of the Branches, since the underlying objects are different, we see multiple child cursors being populated.

In my view, this is an inefficient Database Design Issue. This will also have a serious manageability issue, as any change in the business logic, will have to be replicated across all the schemas. Any Optimization, i.e. creation of a New Index, will have to be replicated across each of the Schemas. This list can go on…

This is very easy to demonstrate.

## This pl/sql will create 300 Users with names as F1, F2, F3 and so on....

declare
l_statement	varchar2(100);
begin
 for i in 1..300
 loop
   l_statement:='drop User F'||i||' cascade';
   execute immediate l_statement;
   l_statement:='Create User F'||i||' identified by F'||i;
   execute immediate l_statement;
   l_statement:='grant dba to F'||i;
   execute immediate l_statement;
 end loop;
end;
/
## Next we create a Table EMP in each of these schemas. In this case, I am inserting same data into these tables.

declare
l_statement	varchar2(100);
begin
 for i in 1..300
 loop
   execute immediate 'alter session set current_schema=F'||i;
   l_statement:='create table emp as select * from scott.emp';
   execute immediate l_statement;
 end loop;
end;
/
## Next let us execute a query on each of the emp schemas (randomly). These will be 30000 queries with different Bind Variables.

declare
l_emp		number;
l_num		number;
l_sch		number;
l_statement	varchar2(1000);
l_ename		varchar2(32);
begin
 for i in 1..30000
 loop
   l_num:=round(dbms_random.value(1,14),0);
   l_sch:=round(dbms_random.value(1,300),0);
   execute immediate 'alter session set current_schema=F'||l_sch;
   select empno into l_emp from (
     select empno, row_number() over(order by empno) rno from scott.emp)
   where rno=l_num;
   l_statement := 'select /*+ vivek1 */ ename from emp where empno=:b1';
   execute immediate l_statement into l_ename using l_emp;
 end loop;
end;
/

Once the queries are executed, we can query v$sql_shared_cursor to check for the number of child cursors.

SQL> SQL> @sql_text
Enter value for 1: select /*+ vivek1
old   2: from v$sqlarea where sql_text like '&1%'
new   2: from v$sqlarea where sql_text like 'select /*+ vivek1%'

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT PLAN_HASH_VALUE
------------- -------------------------------------------------- ----------- ---------- ------------- ---------------
1c9w6s580jtpd select /*+ vivek1 */ ename from emp where empno=:b        5403       1801            30      3956160932
              1

SQL> select count(*) from v$sql_shared_cursor where sql_id='1c9w6s580jtpd';

  COUNT(*)
----------
      8084

SQL> select sql_id, child_number, AUTH_CHECK_MISMATCH, reason from v$sql_shared_cursor where sql_id='1c9w6s580jtpd' and rownum<=5;

SQL_ID        CHILD_NUMBER A REASON
------------- ------------ - --------------------------------------------------------------------------------
1c9w6s580jtpd           94 Y 9437Authorization Check f
1c9w6s580jtpd           95 Y 9537Authorization Check f
1c9w6s580jtpd           96 Y 9637Authorization Check f
1c9w6s580jtpd           97 Y 9737Authorization Check f
1c9w6s580jtpd           98 Y 9837Authorization Check f

For 30000 Executions, we have 8000 Child Cursors. This block was executed from a Single Session and no one else working. Imagine the impact of such an implementation in a highly concurrent environment.

“Library Cache : Mutex X” ! An Interesting Optimization

One of my Customer is migrating from an Non-Oracle Database to Oracle Database and this being a critical Application, the migration is planned in a phased manner. Migration happens every fortnight (a city or set of cities at a time) and is enough to increase the load as a Vertical line (if plotted in a graph). In a simple term, the increase in the load is not linear. With such a dramatic increase, some or the other issues can be expected and this is what triggered post one such migration. The setup is a 2 Node RAC with Database version is 11.2.0.2.8. Connection Pooling is used at the Apps Layer.

During one of the migration the customer increased CPU’s from 80 to 148 on each of the Node. This was based on the recommendation from the Application Team. Immediately, post the migration the top most wait event was “library cache : mutex x”. While this was not impacting the performance but was a cause of concern as the migration was only 40% and next migration would add the load further by 20%.

We had two dramatic changes. These are : (1) The CPU’s Increased from 80 to 148, and (2) The load increased many fold because of the migration of new branches. It was critical to investigate the issue and therefore, we started monitoring the system during peak load. An interesting observation that came out was a function, which was on top, most of the time. It was BEGIN :1 := PA_MC_ENTITY_CONTEXT_FN(:2); END;. The number of executions of this function were also around 600+ per second. For more understanding on this function, we scheduled a discussion with the application team and the outcome of this discussion was very interesting.

During the discussion, it came out that the application has a functionality to use Virtual Private Database (VPD) and this function is used to set the application context, which is then used by each of the application queries to filter out the rows that do not match the context. Usually, VPD is used as a row level security, which is implemented by way of a function to generate WHERE predicate depending upon the privileges or profile of the application user. These predicates can be different for different set of users. All this is generated at run time. Interestingly, in this case, the application context is not generated dynamically based on privileges, but is set to a fix value, ‘4000’, for all the database session. Technically, there was nothing wrong in the implementation and therefore, it was not important to drill down further to relate the link between this function and mutex waits. The VPD Implementation and function code all looked fine

The interesting fact that we came to know was the way this function was called. Further investigation and discussion with the Application team revealed that for each database call, this function is called to set the application context. This means, if a session executes 5 queries, as a part of a transaction, the function to set the application context will be executed 5 times. As per them, the reason behind this is that as the Apps Layer use Connection Pool, there is no way to know whether the connection picked up by the application user is one of the existing connection from the pool or a newly spawned (once all the initial connections are exhausted). Clearly, executing a function for each database calls and setting a context on each call seemed to be a problem, as for each execution of this function, the mutex was requested in an exclusive mode to write to a memory location.

As an immediate optimization, in order to reduce the number of execution and exclusive gets, it was suggested to check for the existence of application context before setting the context. This was achieved by way of a query to first check whether the context is set to desired value. If it is, this means, it is one of existing connection and no need to set the application context. If not, this means, this is a new connection and therefore, the application context is set to ‘4000’. With this implementation, the number of executions of the function reduced drastically, thereby reducing the waits on Library Cache Mutex.

In this case, everyone suspected this to be an Oracle Database BUG. In order to eliminate or reduce the contention, it is imperative to optimize the amount of work done. Unwanted work is always an overhead, as was the case in this optimization.

SQL*Plus Connection Slow ! Library Cache Lock…..

Oracle Database 11g introduced a new security feature, which is meant to protect a users’ password. Refer to section “What are the Oracle Password built-in Password Protections ?” of Chapter 3 of Oracle Database Security Guide. The document says, if a user tries to log in to the database more than 3 times with an Invalid Password, Oracle Database delays each login after third try. Most of them are unware of this new Security feature, and therefore, this sometimes creates a panic situation and an unexpected escalations to Oracle Support.

Recently, one of my customer reported a slow database connection to one of their Application Schema. The connections to other schemas, except the problematic one, were going through perfectly fine. The application connects to the Database through Weblogic Server and the problem summary mentioned that while the Weblogic Server is able to connect successfully (with zero delay) to the same schema on UAT, it takes around 30 seconds to connect to the same user on production. Therefore, concern over an issue with Database Server was raised. A severity 1 SR was raised and operating system truss output for the problematic and non-problematic user was uploaded as well. The code path of both the output was exactly same, except for the time taken for the problematic user.

One simple logic to understand was, that if it is an Oracle Server Issue, then it should cause the delay in connection for all the Schema’s and not just to one. Therefore, there had to be some problem with this Schema. It was thoroughly checked whether there are any AUDITING or ON-LOGON Triggers on this Schema. Database Audit Trail was enabled for all the users and therefore, this was not an exception for this schema. There were no ON-LOGON triggers on this User. Therefore, what was causing this delay was a big question.

Customer noticed this due to the fact that they have couple of scheduled cron jobs that runs every 30 seconds. These jobs usually takes, 5-10 seconds to complete, but since last few days, they have observed these overlapping. While Investigating this issue, I noticed that when we issue sqlplus username/password, the session waits on Library Cache Lock for around 15-20 seconds and then authenticates the user. This was the case with the scheduled jobs as well. Each of these jobs were waiting on the latch. Why should an SQL*Plus command wait on Library Cache Lock ? For other users, it could immediately authenticate.

In this case, Database Audit Trail, which inserts a record in aud$ came to our rescue. In the aud$ table, we noticed that for last 3-4 days, the number of failed attempts had increased dramatically. Further, it was also revealed that these connections from our the Application Servers configured at a DR site and were pointing to the production database server using wrong password. The error message reported in the aud$ was 1017, which is “Invalid Username / Password”.

The library cache lock wait is seen due to the fact that the account status gets updated due to incorrect login and is maintained in the library cache. This is to prevent password guessing attack as there’s a sleep in the code when incorrect login attempts exceed count of 3. These sleeps cause the latch, as the process holding it is yet to release the lock. While this behaviour is a Security enhancements, it can be disabled by setting event 28401.

parse count (failures) – How do I find queries causing failures ?

In my last blog on Hard Parses and failures posted on September 3, 2011, I missed to write about the ways to find the Queries causing “parse count (failures)”. I received few mails from some of readers wanting to know on how to get these. Since these queries fail, these are not stored in the Shared Pool and therefore, it is not possible to get these from v$ views. Secondly, since these are not stored in the Shared Pool, everytime these queries are executed, these are hard parsed, thus causing unwanted contention on Library Cache latches.

A possible way to get these is 10046 traces. I used these traces to get the Queries that were causing these failures to occur. Yesterday, I was working at another customer site, where I could see some failures happening. These are again around 25% to 30% of the total Hard Parses. Therefore, if we eliminate these, total hard parses / second would come down. Following query helps me check, whether the statistics related to hard parses are alarming.

set linesize 132
alter session set nls_date_format='DD-MM-YYYY HH24:MI:SS';
select sysdate, name, value from v$sysstat where name like 'parse%';
exec dbms_lock.sleep(10);
select sysdate, name, value from v$sysstat where name like 'parse%';

I execute a query against v$sysstat, wait for 10 seconds and re-execute the same. Therefore, the incremental figure is for 10 seconds duration. From this, it is very easy to compute the parses/second ratio. Following is the output of the query taken yesterday :

SQL> @hard_parse

Session altered.

Elapsed: 00:00:00.00

SYSDATE             NAME                                                                  VALUE
------------------- ---------------------------------------------------------------- ----------
20-09-2011 10:28:14 parse time cpu                                                       117213
20-09-2011 10:28:14 parse time elapsed                                                   390923
20-09-2011 10:28:14 parse count (total)                                                54294681
20-09-2011 10:28:14 parse count (hard)                                                   766713
20-09-2011 10:28:14 parse count (failures)                                               177555
20-09-2011 10:28:14 parse count (describe)                                                  204

6 rows selected.

Elapsed: 00:00:00.03

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.00

SYSDATE             NAME                                                                  VALUE
------------------- ---------------------------------------------------------------- ----------
20-09-2011 10:28:24 parse time cpu                                                       117216
20-09-2011 10:28:24 parse time elapsed                                                   390969
20-09-2011 10:28:24 parse count (total)                                                54303954
20-09-2011 10:28:24 parse count (hard)                                                   767011
20-09-2011 10:28:24 parse count (failures)                                               177637
20-09-2011 10:28:24 parse count (describe)                                                  204

6 rows selected.

Elapsed: 00:00:00.02
SQL> select 767011-766713, 177637-177555 from dual;

767011-766713 177637-177555
------------- -------------
          298            82

In this case, the Hard Parses are 29.8 per second and failures are 8.2, which is around 28% of the total hard parses. To check for the queries causing these, I generated enabled 10046 trace at system level, during less concurrency time (non-peak hours), and issued grep -i “PARSE ERROR” *.trc. This gave me the name of the trace files and from these traces, it was very easy to figure out the queries that were failing. In the trace file, search for PARSE ERROR and you will get the query, alongwith the reason for its failure, which is ora-942 (table or view does not exists), as can be seen in err column.

grep -i "PARSE ERROR" *.trc
RAC1_ora_1966688.trc:PARSE ERROR #1:len=157 dep=0 uid=38 oct=3 lid=38 tim=2611876001420 err=942
RAC1_ora_1966688.trc:PARSE ERROR #1:len=157 dep=0 uid=38 oct=3 lid=38 tim=2611876029670 err=942
RAC1_ora_1966688.trc:PARSE ERROR #5:len=95 dep=1 uid=0 oct=3 lid=0 tim=2611876430154 err=942

Do we need to take care of Hard Parses only when Library Cache related Latches are on top ?

I just concluded a 3 months Production Optimization Project. The problem summary was an overall slowness of the production system, which is a three tier architecture. The Production System includes a 2 Node 10.2.0.5 Database with 4 Application Servers configured with Connection Pool. As with all my Performance Project, this time too I started off with a one round of discussion with all the stake holders. These stake holders manage the production system and were called upon to express their opinion on the overall slowness. While everyone voiced together that the slowness is consistent the entire day, they also mentioned that they are confident that the issue is with the Oracle Application Server and the way it manages the Connection Pool. The rationale behind this assumption was the Connection Pool Statistics, which says that as the load increases, the number of Connection Pools configured increase drastically and are not freed until the load comes down. The peak starts at 11 am and, since the production system is slow, the load comes down by 9 pm. The business hours of the customer is 9 am to 5 pm.

Initial investigation revealed that with Automatic Shared Memory Management enabled, the Shared Pool had grown upto 40GB. This is pretty high. I have not seen any configuration that requires such a huge Shared Pool. This was surely a cause of concern and therefore, this was further drilled down to one of a serious bug in the application. This was Hard Parse.

An Application or Database flooded with Unshared Application queries is considered as an Unscalable Application. Slowly or Gradually, as the User Load increase, the performance issues are bound to come. In this case, the Hard Parse / Second ratio was around 180 per second. While there were many queries that were using Literals, the most surprising part was the Hard Parse failures, which were almost 68% of the total Hard Parses. Therefore, once a patch to fix the queries that were using Literals was applied, there was no significant improvement.

The most challenging part of this assignment was the reluctance from the Application Team to accept the Hard Parses or the Parse failure to be an Issue. The debate from the Application Team was that reducing the Hard Parses would not yield any benefit as they do not see any Library Cache related latches on top. They have been taught to see the AWR reports and look to the top events. Reducing the number of waits of the top most component seemed to be on top of the list for Application team. While this is true, to some extent, but in case of Hard Parses, this would not work.

This is the reason, the title of my blog is “Do we really need to take care of Hard Parses only when Library Cache related Latches are on top ?”. In this case, the Hard Parses can be easily related to the contention that we observed on the Connection Pool. Usually, the response time of a Connection Pool or a Shared Server Architecture should be less than a Second. If this goes high, we might see a slowness or contention on Connection Pool, as other users may have to wait for the pool to be released. This again explains, why the number of connections pools increased as the load increases.

Hard Parses are a performance killer, also called as a Silent Killer. Many a times, Hard Parses does not cause significant waits on Library Cache latches. It depends on the resources available at the Database Server. In this case, it was a 2 Node RAC with 56 CPU’s on each machine. The Hardware configuration was enough and therefore, the contention was not that significant to cause Latch Free waits. As we all know, Hard Parse takes more time than a Soft Parse or a No Parse and this resulted in a signifcant waits or contention on Connection Pool. Also, as mentioned above, Hard Parses are Silent Killer. With that said, if these are not taken care of at a initial stage, this can bring down the Production System (as it did in the case of my Customer).

To summarize, Hard Parses are to be addressed even if these are not visible as a Latch Contention. In case of my customer, as soon as the Hard Parses and Parse failures were fixed, the performance of the production system improved drastically and was consistent through out the day. The users were able to enter their transactions in the business hours. Further, the contention on the connection pool came down as well.

Row Cache Object Latch Contention : Improper Implementation of Oracle Feature – Day 2 of AIOUG

On Day 2 of AIOUG, I discussed on Library Cache related latches and some causes of the contention on these latches. One such contention is for Row Cache Object latches that protects the Data Dictionary loaded in the Shared Pool. Dictionary Cache is one of the component of a Shared Pool and all the definitions, required to parse a Cursor, is stored in this cache, in the form of a ROW, thus is protected by Row Cache Object Latches.

The Issue

Almost a year back, I was involved in a major performance issue caused due to “Latch: row cache objects” contention. This was a 2 Node RAC setup running 10.2.0.3 database. The issue was normally reported during the peak hour with CPU Utilization above 95%. During this slowness window, there was a major drip in Transaction per Second and this resulted in an increase in pending volume to be entered into the system.

As mentioned earlier, as “Row Cache Object” latches protect Dictionary Cache information, contention on these latches needs investigating the sub-component of the Dictionary Cache and this can be done thru AWR reports. “Dictionary Cache Stats” section in AWR gives the Get Requests, PCT Miss and other required information for each of the sub-component of Dictionary Cache. This section revealed that the GET REQUESTS on DC_USERS was very high. Another alarming statistics was huge version_counts for the application queries. Since, the number of get requests on DC_Users seemed to be abnormally high, the focus was to know for the potential causes of these high numbers and the one related to this application or database. Further investigation drilled down the issue to Virtual Private Database (VPD). The customer here had implemented VPD and this implementation caused the contention on these latches.

Does it mean VPD is an Issue ?

VPD is a very useful feature as it provides an easy and manageable means of implementing row level security, but there are certain implementation portions, that needs to be considered based on the requirement, and it is this piece, if improperly implemented, can cause a major performance issue. In a nutshell, it is not VPD but the way it was being implemented that caused a major issue at the customer site. We will cover this piece of problematic implementation in this blog, with a test case, that will help you to test this on your test system and validate the results.

A brief explanation of VPD

VPD, also called as Row Level Security or Fine Grained Access Control, is a feature, introduced in Oracle 8i, that allows you to define security policies on tables by way of a function that implies a restriction on the rows that a user can SELECT or UPDATE or INSERT or DELETE, based on the privileges defined for that user. For example, assuming a following hierarchy :
Employee S is from South India Region, N from North India Region, W from West India Region, E from East India Region, I from All India Region and finally, APAC for Asia Pacific. With this herierchy, a requirement like :

  1. User S can View and Manipulate all the rows from South India Region, and,
  2. User N can View and Manipulate all the rows from North India Region, and,
  3. User W can View and Manipulate all the rows from West India Region, and,
  4. User E can View and Manipulate all the rows from East India Region, and,
  5. User I can View and Manipulate all the rows from 1 to 4 i.e.South, North, West and East India Region, while,
  6. User A can View and Manipulate all the rows from 1 to 5, including other countries in Asia Pacific Region.

These restrictions can be implemented by using VPD without much of an effort. Application does not require any changes for these implementations, as these restrictions are applied in the background at the runtime. There are three basic components of VPD. These are :
Policy : A declarative command that determines when and how to apply the policy i.e. during Selects, Insertions, Deletion, Updation or combination of any or all of these.
Policy Function : A Pl/SQL function that is called whenever the condition specified in the policy are met.
Predicate : A string that is generated by the policy function, and the applied to the users’ sql statements. This is an additional Filter predicate generated by the policy function and appended to the sql statement executed by the user, at the runtime.

Once these three are defined as a policy, then this policy can be attached to the tables that requires Security to be implemented. While Policy, Policy Function and Predicate are three main building blocks of VPD, there is another component, that is mostly ignored, called as Policy Type. In most of the cases, this is not defined, which means, it is left as default. If not defined, it defaults to a value depending on the database version and therefore, should not be left undefined. Starting Oracle 10g, the policy type defaults to DYNAMIC. While there are other policy types : SHARED Static, CONTEXT-SENSITIVE and SHARED Context Sensitive. The implementation of each of these depend upon the requirement.

Back to the Issue

In this case, it was the implementation of VPD that caused huge Version Count and both of these had a major impact on the performance due to heavy latch contention. Again, it was not the VPD that caused multiple version counts, but the implementation. The Application team, in this case, while implementing VPD, left POLICY_TYPE to default and this means DYNAMIC. DYNAMIC policy_type implies, dynamic builtup of the predicates at runtime, based on the policy function. The issue with Dynamic policy_type is that with each row processed by the query, the policy function is applied, predicate is evaluated and validated whether the row is to be fetched to the user. This means, the query goes through parse-execute-fetch phase for each and every row processed by the query and this significantly increases a Parse Overhead.

Based on the customer requirement, a better option would have been either Context-Sensitive or Shared Context-Sensitive, that internally implements Dynamic predicate generation, at runtime, but goes through the Parse phase only when the Context of the User Session changes and not for each and every row processed. Further, in this customer case, just changing the Policy_type from Dynamic to Shared Context-Sensitive did not help, as there was a major design flaw in the policy function. Lets walk thru a test case and at the end, I will present some facts from the customer system. The test case simulates the implementation at the customer site and the change suggested, thereafter to improve the performance. The change, as depicted in the test case, is in the policy function and the change in policy_type is already changed to Shared Context-Sensitive.

TEST CASE

For this test case to work, I have created some set of statistics table that will store the latch activity. Schema VIVEK contains an application table TEST_VPD. I further create USER_X schema and already have SCOTT and OE, as an application user schemas, ITDBA schema, that contains Security Table. This table defines the security policies for each application users. All of the VPD implementations pre-requisites are created in ITDBA schema.

Once the pre-requisites are created, we will execute certain queries, against TEST_VPD, from VIVEK : access to all the rows, OE Schema : access to rows pertaining to rcc_num=10, SCOTT Schema : access to rows pertaining to rcc_num=20 and USER_X Schema : access to rows pertaining to rcc_num=30. All these access restrictions are defined in EMP_SEC table under ITDBA Schema and the predicates are applied, at run time, by way of Policy Function GF_GLOBAL_POLICY_OPT_RSTRCT_VW, again under ITDBA Schema. Once these are done, we will run certain queries against our Statistics table and some data dictionary views to check for the performance. The policy function, GF_GLOBAL_POLICY_OPT_RSTRCT_VW, will then be changed to improve the performance and the queries against TEST_VPD and measure the performance.

connect vivek/vivek

drop table latch_stats_start;
drop table latch_stats_end;
drop table test_vpd;
drop user user_x cascade;
drop user itdba cascade;
drop public synonym as_test_vpd;
drop context rcc_ctx;

exec dbms_random.seed(0);
create global temporary table latch_stats_start as
select name, gets from v$latch;

create global temporary table latch_stats_end as
select name, gets from v$latch;

create table test_vpd as
select	dbms_random.string('U',5)||lpad(rownum,5,'0') pan,
	dbms_random.string('U',20) name,
	case when mod(rownum,10)=0 then 'F' else 'M' end sex,
	round(dbms_random.value(10,100),0) rcc_num
from	all_objects
where	rownum between 1 and 10000;
create unique index pk_pan on test_vpd(pan);
exec dbms_stats.gather_table_stats(user,'test_vpd',cascade=>true,method_opt=>'for all columns size 1');
create public synonym as_test_vpd for test_vpd;
grant select on as_test_vpd to public;

create user user_x identified by vivek;
grant connect,resource to user_x;

connect sys/vivek as sysdba
create user itdba identified by vivek;
grant dba to itdba;
grant execute on dbms_rls to itdba;

connect itdba/vivek

create context rcc_ctx using ITDBA.set_rcc_ctx;

create procedure set_rcc_ctx(p_attr in varchar2, p_val1 in varchar2, p_val2 in varchar2, p_val3 in varchar2)
is
begin
   dbms_session.set_context('RCC_CTX',p_attr,p_val1, p_val2, p_val3);
end;
/

create table itdba.emp_sec (user_id varchar2(20), rcc_num number);

insert into emp_sec values('SCOTT',20);
insert into emp_sec values('OE',10);
insert into emp_sec values('USER_X',30);

CREATE OR REPLACE function ITDBA.GF_GLOBAL_POLICY_OPT_RSTRCT_VW(OBJECT_SCHEMA VARCHAR2, OBJECT_NAME VARCHAR2) return varchar2 is
    L_SEL_FLG VARCHAR2(3);
    L_RCC_NUM NUMBER;
    L_USER_ID  VARCHAR2(20);
    l_predicate varchar2(30);
    v_Restricted_Policy varchar2(1);
begin
    IF  L_USER_ID IS NULL AND USER IN ('ITDBA','SYS','SYSTEM','VIVEK') THEN
       RETURN NULL;
    ELSE
       IF L_USER_ID IS NULL THEN
 	  L_USER_ID:=USER;
       END IF;
     if l_user_id in ('USER_X','SCOTT','OE') then v_Restricted_Policy:='Y'; end if;
     IF v_Restricted_Policy = 'Y' THEN
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num, null, null);
	l_predicate:='RCC_NUM in '||sys_context('RCC_CTX','RCC_NUM');
	 return l_predicate;
     ELSE
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num,l_rcc_num+100,99);
	l_predicate:='RCC_NUM in '||sys_context('RCC_CTX','RCC_NUM');
	 return l_predicate;
     END IF;
    END IF;
Exception
  When others then
       Raise_Application_Error (-20000,SQLERRM);
END;
/
begin
dbms_rls.add_policy(object_schema=>'VIVEK',
   object_name=>'test_vpd',
   policy_name=>'RCC_NUM_POLICY',
   function_schema=>'ITDBA',
   policy_function=>'GF_GLOBAL_POLICY_OPT_RSTRCT_VW',
   statement_types=>'INSERT,SELECT,UPDATE,DELETE',
   update_check=>true,
   policy_type=>dbms_rls.SHARED_CONTEXT_SENSITIVE);
end;
/

Now, since the pre-requisite is done, we will execute a query against TEST_VPD from VIVEK, OE, SCOTT and USER_X Schemas to check the rows returned by each of these users. This will help us validate and confirm whether the policy function is applied as per our requirement.

connect vivek/vivek
column pan for a20
column name for a30

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;

select /*+ VIVEK */ * from as_test_vpd  where rcc_num=20 and rownum<=3;

select /*+ VIVEK */ * from as_test_vpd  where rcc_num=30 and rownum<=3;

## User OE only allowed to see 10
----------------------------------

connect oe/vivek

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd where rcc_num=20 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd where rcc_num=30 and rownum<=3;

## User SCOTT only allowed to see 20
-------------------------------------
connect scott/tiger

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd  where rcc_num=20 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd where rcc_num=30 and rownum<=3;

## User USER_X only allowed to see 30
---------------------------------------

connect user_x/vivek

select /*+ VIVEK */ * from as_test_vpd where rcc_num=10 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd  where rcc_num=20 and rownum<=3;
select /*+ VIVEK */ * from as_test_vpd  where rcc_num=30 and rownum<=3;

Now for the performance. This is the Original Implementation.

connect vivek/vivek
alter system flush shared_pool;
connect scott/tiger
insert into vivek.latch_stats_start select name, gets from v$latch;
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
insert into vivek.latch_stats_end select name, gets from v$latch;
connect oe/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
connect user_x/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3; 
SQL>set lines 132
SQL> set pages 1000
SQL> column sql_text for a40 wrap
SQL>
SQL> select sql_id, sql_text, executions, child_latch, version_count from v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH VERSION_COUNT
------------- ---------------------------------------- ---------- ----------- -------------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          3           2             3
              here rownum<=3 
SQL>select sql_id, sql_text, executions, child_latch, child_address from v$sql where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH CHILD_AD
------------- ---------------------------------------- ---------- ----------- --------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          1           2 69EAE56C
              here rownum<=3

1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          1           2 69065194
              here rownum<=3

1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          1           2 68FEDAE8
              here rownum<=3 
SQL>column predicate for a30
SQL>select sql_id, ADDRESS, PREDICATE from v$vpd_policy where sql_id='1n62sv9g6nw77';

SQL_ID        ADDRESS  PREDICATE
------------- -------- ------------------------------
1n62sv9g6nw77 69EAE56C RCC_NUM in 20
1n62sv9g6nw77 69065194 RCC_NUM in 10
1n62sv9g6nw77 68FEDAE8 RCC_NUM in 30

When the Query was executed from different application users, three in our case, the version_count of the query increased as well. The dynamic performance view V$VPD_POLICY shows the runtime plan, which is Literal in this case and causes a query to be hard parsed. Below mentioned query can be executed to check for the latch statistics.

column name for a30
set lines 200
set pages 1000
select * from (select a.name, b.gets end_gets, a.gets Start_gets, b.gets-a.gets
from	vivek.latch_stats_start a,
	vivek.latch_stats_end b
where	b.name = a.name
order by 4 desc)
where rownum<=10;

MODIFICATION

The code below shows the modification to the function that is applied to the policy. Look for the l_predicate value at the end that is finally returned as an Output to the function.

connect vivek/vivek
CREATE OR REPLACE function ITDBA.GF_GLOBAL_POLICY_OPT_RSTRCT_VW(OBJECT_SCHEMA VARCHAR2, OBJECT_NAME VARCHAR2) return varchar2 is
    L_SEL_FLG VARCHAR2(3);
    L_RCC_NUM NUMBER;
    L_USER_ID  VARCHAR2(20);
    l_predicate varchar2(60);
    v_Restricted_Policy varchar2(1);
begin
    IF  L_USER_ID IS NULL AND USER IN ('ITDBA','SYS','SYSTEM','VIVEK') THEN
       RETURN NULL;
    ELSE
       IF L_USER_ID IS NULL THEN
 	  L_USER_ID:=USER;
       END IF;
     if l_user_id in ('USER_X','SCOTT','OE') then v_Restricted_Policy:='Y'; end if;
     IF v_Restricted_Policy = 'Y' THEN
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num, null, null);
	l_predicate := 'RCC_NUM = sys_context (''RCC_CTX'', ''RCC_NUM'')';
	 return l_predicate;
     ELSE
       select rcc_num into l_rcc_num from itdba.emp_sec
       where user_id = l_user_id;
       set_rcc_ctx('RCC_NUM',l_rcc_num,l_rcc_num+100,99);
       	l_predicate := 'RCC_NUM = sys_context (''RCC_CTX'', ''RCC_NUM'')';
	 return l_predicate;
     END IF;
    END IF;
Exception
  When others then
       Raise_Application_Error (-20000,SQLERRM);
END;
/

Once the change is done, we will execute the query on TEST_VPD from the three application users (USER_X, OE and SCOTT), and compare the latch statistics and version_count.

connect vivek/vivek
alter system flush shared_pool;
connect scott/tiger
insert into vivek.latch_stats_start select name, gets from v$latch;
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
insert into vivek.latch_stats_end select name, gets from v$latch;
connect oe/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
connect user_x/vivek
select /*+ VIVEK */ * from as_test_vpd where rownum<=3;
connect vivek/vivek
Connected.
SQL> set lines 132
SQL> set pages 1000
SQL> column sql_text for a40 wrap
SQL>
SQL> select sql_id, sql_text, executions, child_latch, version_count from v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH VERSION_COUNT
------------- ---------------------------------------- ---------- ----------- -------------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          3           2             1
              here rownum<=3 
SQL>select sql_id, sql_text, executions, child_latch, child_address from v$sql where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH CHILD_AD
------------- ---------------------------------------- ---------- ----------- --------
1n62sv9g6nw77 select /*+ VIVEK */ * from as_test_vpd w          3           2 68BEE27C
              here rownum <=3 
SQL>column predicate for a30
SQL>select sql_id, ADDRESS, PREDICATE from v$vpd_policy where sql_id='1n62sv9g6nw77';

SQL_ID        ADDRESS  PREDICATE
------------- -------- ------------------------------
1n62sv9g6nw77 68BEE27C RCC_NUM = sys_context ('RCC_CT
                       X', 'RCC_NUM')

Improved performance can be clearly seen from the version_count, output of V$VPD_POLICY view and from the latch statistics. Once this was implemented, the customer reported a significant performance improvement, for which the real facts are shown below.



Optimizing Performance Part 2 Matrix


Optimizing Performance Part 2 Top 5 waits

CUSTOMER MATRIX – BEFORE

 

TOP 5 WAITS

 



Optimizing Performance Part 2 Achievement

CUSTOMER MATRIX – AFTER

 

To summarize, Oracle provided features enable us to implement these with great ease. Implementing these require careful study of the requirement and proper implementation, that does not compromise with the performance.

Library Cache Latch Contention due to Multiple Version Count – Day 2 of AIOUG

Day 2 of “All India Oracle User Group” was continuation of where I left on Day 1 – Library Cache and Multiple Version Count. On Day 2, I had a real life example on the issue that I presented. On Day 1, the conceptual knowledge of Library Cache, Parent & Child Cursors and some of the fundamental, but not to be ignored, causes of Multiple Child Cursors was shared.

One of the major cause of Multiple Version for an SQL is CURSOR_SHARING. When cursor_sharing is set to SIMILAR, optimizer checks for either of these :

Is the Operator NEITHER of ‘=’ or ‘!=’

OR

Are there Column Histograms present on the Columns

When either of these are true, then, irrespective of whether the execution plan of the new child cursor is already stored, a new Child cursor is created. The only exception is, if the runtime value of the bind variable matches that of an already stored child cursor. We will see this with an example.

Before we begin, as a prerequisite, we will create our test case that will be used for both the cases mentioned above. The data in this table is populated in a skewed manner and we create an Index on the skewed column.

PREREQUISITE

exec dbms_random.seed(0);

drop table test_hist;

create table test_hist as 
with n as
(select /*+ materialize */ round(dbms_random.normal,0) n1 from all_objects where rownum<=10000)
select decode(sign(n1),-1,n1+9,n1) empid,
       round(dbms_random.value(1000,2000),0) sal,
       case when mod(round(dbms_random.value(101,110),0),2)=0 then 'M' else 'F' end gender
from	n
where rownum<=10000;

create index test_hist_idx on test_hist(empid);

select empid, cnt, round(cnt/final_total*100,2) percentage from (
select empid, cnt, sum(cnt) over() final_total
from (
select empid, count(*) over(partition by empid) cnt, 
row_number() over (partition by empid order by empid) rno 
from test_hist)
where rno=1);

     EMPID        CNT PERCENTAGE
---------- ---------- ----------
         0       3937      39.37
         1       2289      22.89
         2        630        6.3
         3         55        .55
         4          4        .04
         5          4        .04
         6         64        .64
         7        646       6.46
         8       2371      23.71

9 rows selected.

Once the test case is ready, we will walk through the two issues, mentioned above, introduced by Cursor_Sharing set to SIMILAR. For this purpose, we will generate statistics in two different ways: With & Without Histograms.

TEST CASE 1 : With Histogram and Equality Predicates

Gather Stats on this table with method_opt as ‘FOR COLUMNS EMPID SIZE 74’. With this, we generate Frequency Histogram on Empid column. Since the column has only 9 distinct values, creating histogram will generate different execution plans for the values with less data.

Combination of Cursor_Sharing to SIMILAR and unwanted Histograms can cause a major performance bottleneck, with contention on Library Cache related waits. Therefore, it is always recommended to generate Histograms only when required. Oracle 10g changed the default value of method_opt from ‘FOR ALL COLUMNS SIZE 1’ to ‘FOR ALL COLUMNS SIZE AUTO’ and with this change, any database with CURSOR_SHARING set to SIMILAR, would face a severe latch contention issue after upgrading to 10g, as unwanted histograms are generated on columns with large distinct values. As you will see in this test case, these unwanted histograms introduce more child cursors, further causing a hash chain to grow longer. We all know that latches are to be held for brief duration, but, if number of child cursors grow, inspecting a hash chain for a matching cursor can take significant amount of time preventing others to get the latch.

alter system set cursor_sharing=similar scope=memory;
select /*+ VIVEK */ sal from test_hist where empid=0;
select /*+ VIVEK */ sal from test_hist where empid=2;
select /*+ VIVEK */ sal from test_hist where empid=4;
select /*+ VIVEK */ sal from test_hist where empid=7;
select /*+ VIVEK */ sal from test_hist where empid=8;

Set CURSOR_SHARING to SIMILAR and execute 5 queries with different values passed to empid. Once these are run, check against v$sqlarea, v$sql and finally, v$sql_shared_cursor to get the cause of multiple versions. The output would be similar to that of mine, except for the plan_hash_value. In my case, for all the cases above, the optimizer opted for Full Table Scan. In your case, the plans would differ. Even though the execution plans may or may not match, it is irrelevant for this discussion. After execution of the 5 queries, the output would look like:

SQL> select sql_id, sql_text, executions, child_latch, version_count, plan_hash_value, hash_value fr
om v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS CHILD_LATCH VERSION_COUNT PLAN_HASH_VALUE HASH_VALUE
------------- ------------------------------ ---------- ----------- ------------- --------------- --
baz378z2m5q42 select /*+ VIVEK */ sal from t          5           2             5      3529380458 3308443778
              est_hist where empid=:"SYS_B_0
              "


Elapsed: 00:00:00.10
SQL>select sql_id, sql_text, executions, child_latch, plan_hash_value, child_number, child_address from v$sql where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS CHILD_LATCH PLAN_HASH_VALUE CHILD_NUMBER CHILD_AD
------------- ------------------------------ ---------- ----------- --------------- ------------ ---
baz378z2m5q42 select /*+ VIVEK */ sal from t          1           2      3529380458            0 699F048C
              est_hist where empid=:"SYS_B_0
              "

baz378z2m5q42 select /*+ VIVEK */ sal from t          1           2      3529380458            1 68A65B38
              est_hist where empid=:"SYS_B_0
              "

baz378z2m5q42 select /*+ VIVEK */ sal from t          1           2      3529380458            2 68A47B00
              est_hist where empid=:"SYS_B_0
              "

baz378z2m5q42 select /*+ VIVEK */ sal from t          1           2      3529380458            3 69519264
              est_hist where empid=:"SYS_B_0
              "

baz378z2m5q42 select /*+ VIVEK */ sal from t          1           2      3529380458            4 6A3B1AE0
              est_hist where empid=:"SYS_B_0
              "

Elapsed: 00:00:00.25

SQL> select * from v$sql_shared_cursor where sql_id='baz378z2m5q42';

SQL_ID        ADDRESS  CHILD_AD CHILD_NUMBER U S O O S L S E B P I S T A B D L T R I I R L I O S M U 
------------- -------- -------- ------------ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
M B M R O P M F L
- - - - - - - - -
baz378z2m5q42 684B2D44 699F048C            0 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N 
N N N N N N N N N

baz378z2m5q42 684B2D44 68A65B38            1 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N 
N N N N N N N N N

baz378z2m5q42 684B2D44 68A47B00            2 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N 
N N N N N N N N N

baz378z2m5q42 684B2D44 69519264            3 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N 
N N N N N N N N N

baz378z2m5q42 684B2D44 6A3B1AE0            4 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N 
N N N N N N N N N

Surprisingly, v$sql_shared_cursor does not provide a justification for the cause of multiple child cursors. If you again run a query with EMPID set to any of the values passed earlier (eg empid=0), the version_count will not be incremented. This means, a child cursor, in this case, will only be shared if a value passed to the FILTER Predicate matches any of the bind value previously executed.

TEST CASE 2 : Without Histogram and Range Predicates

For this test case, regather the statistics on the TEST_HIST table with method_opt changed to ‘FOR ALL COLUMNS SIZE 1’, which means, no histograms. Without Histograms, multiple child cursors will be created for any queries that does not contain EQUALITY (=) or NOT EQUAL (!=) predicates. For demonstration, make sure the histograms are removed and then execute following queries:

select /*+ VIVEK */ empid, sal from test_hist where sal between 1000 and 1100;

select /*+ VIVEK */ empid, sal from test_hist where sal between 1200 and 1500;

select /*+ VIVEK */ empid, sal from test_hist where sal between 1700 and 1800;

select /*+ VIVEK */ empid, sal from test_hist where sal between 1500 and 2000;

The output should look like:

SQL> select sql_id, sql_text, executions, child_latch, version_count, plan_hash_value, hash_value fr
om v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS CHILD_LATCH VERSION_COUNT PLAN_HASH_VALUE HASH_VALUE
------------- ------------------------------ ---------- ----------- ------------- --------------- --
2xk8mywyq0z69 select /*+ VIVEK */ empid, sal          4           2             4      1054764382 1029733577
               from test_hist where sal betw
              een :"SYS_B_0" and :"SYS_B_1"


Elapsed: 00:00:00.04
SQL> select sql_id, sql_text, executions, child_latch, plan_hash_value, child_number, child_address 
from v$sql where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS CHILD_LATCH PLAN_HASH_VALUE CHILD_NUMBER CHILD_AD
------------- ------------------------------ ---------- ----------- --------------- ------------ ---
2xk8mywyq0z69 select /*+ VIVEK */ empid, sal          1           2      1054764382            0 69BCD9A0
               from test_hist where sal betw
              een :"SYS_B_0" and :"SYS_B_1"

2xk8mywyq0z69 select /*+ VIVEK */ empid, sal          1           2      1054764382            1 69FF769C
               from test_hist where sal betw
              een :"SYS_B_0" and :"SYS_B_1"

2xk8mywyq0z69 select /*+ VIVEK */ empid, sal          1           2      1054764382            2 693BED94
               from test_hist where sal betw
              een :"SYS_B_0" and :"SYS_B_1"

2xk8mywyq0z69 select /*+ VIVEK */ empid, sal          1           2      1054764382            3 68F50694
               from test_hist where sal betw
              een :"SYS_B_0" and :"SYS_B_1"

Again, in this case, if you re-execute a query with sal between 1200 and 1500, the version_count will not be incremented as a shared version of a query with same bind values, is already stored in the library cache. In both the cases, v$sql_shared_cursor does not provide the justification for the cause of multiple child cursors and therefore, it is very difficult to diagnose this.

These two test cases should help you understand the side effect of setting Cursor_Sharing parameter to SIMILAR. With this setting, even if you take utmost care about not generating unwanted histograms, any range predicate queries can cause Library Cache contention issue. I am sure that while Histograms are avoidable, Range Based Queries are unaviodable. 🙂

A Child Library Cache Latch protects multiple objects (cursors). This means, two or more entirely different queries can be hashed to a same child latch. Time Consuming match for a query with several child cursors can also cause other sessions to wait on Library Cache Latches for other queries, protected by this same latch, with 1 child cursor. Therefore, Library Cache Latch contention are also called as False Contention and utmost care has to be taken while diagnosing contention on these latches.

Real Life Example

During the AIOUG event and after the completion of these test cases, I presented on an Real Life Example. One of my customer complained for severe performance issues and one of the cause of this slow performance was identified as Contention on Library Cache Latches. These are some of the statistics, before and after eliminating Library Cache Latch Contention.

BEFORE

Optimizing Performance Part 21

ISSUE

The combination of Database Version 10.2.0.4 and Cursor_Sharing=SIMILAR was enough to predict the issue which was due to multiple version count. The table below shows the list of top 20 Queries, with Executions and their Version Counts.

select * from (
select sql_id, version_count, executions, child_latch from v$sqlarea
order by 2 desc)
where rownum<=20;

SQL_ID        VERSION_COUNT EXECUTIONS CHILD_LATCH
------------- ------------- ---------- -----------
5w93sms4xr70b         25775      59303          42
4upsrgs9r0z8h         19982      41372          10
cpk0xaggwc9q7         15316      41929          56
fx0662kfns5mj         12060      23058          17
c8brg2z6ytxjk          8991      19637          21
6t87d75m0hby0          8863      46765          37
cnsttwq7y79tc          8549      42953           7
76hw2b3ujar2z          8290        988          61
dh8yhs5pjh0dt          7670      10890          28
8xjwbmfsn0s2y          7057        996          53
dfp62t3qy92fm          6176     192357          26
2d1p0dmyaapv3          4718      16237          64
0a8bvn7zn8u26          4026       9192          39
a2qn47hh1ypqk          3982       7017           8
c1dq1hg17y3xg          3520       2255          62
08q94ja2kmqs3          3304       6143           2
4gm2wn99nk1mk          3190       6566          15
6s2cp8hk4bych          2907      48162          20
8z1b906qc3tnx          2752       6060          47
bg5gftj0r3s4m          2630      10283          52


20 rows selected.

The text of 1st Query from the above list is as under :

SQL_ID : 5w93sms4xr70b
Module : GI6_03
Sharable_mem : 337,369,920  

SELECT :"SYS_B_0" FROM GIT_LOP_COVER 
WHERE 	ROWNUM < :"SYS_B_1" 
AND 	LOP_SYS_ID = :"SYS_B_2" 
AND 	END_N_IDX = :"SYS_B_3"
AND 	END_S_NO  = :"SYS_B_4" 
AND 	ATTACH_L_NO = :"SYS_B_5" 
 
OWNER             PAR   NUM_ROWS     BLOCKS LAST_ANALYZED        GLO T
----------------- --- ---------- ---------- -------------------- --- -
LOP_PROD          NO    86222715    4334060 02-Sep-2008 10:05:26 YES N
 
 
COLUMN_NAME       NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
----------------- ------------ ---------- ---------- ---------------
LOP_SYS_ID            12529993          0 1.8931E-07 HEIGHT BALANCED 
END_N_IDX                   23          0 5.8944E-09 FREQUENCY
END_S_NO                     1          0 5.8944E-09 FREQUENCY
ATTACH_L_NO                  3   11880179 6.8543E-09 FREQUENCY

The Query uses Equality Predicates and we have already seen the issue due to Histograms and Cursor_Sharing SIMILAR. To validate this further, check for the column statistics, as shown above, for the queried table and it uses Histograms on the columns used in this query.

SOLUTION

To resolve this latch contention issues, the default method_opt was changed to ‘FOR ALL COLUMNS SIZE 1’ and the statistics were recollected. This gave some relief, but the version count for some of the queries were still high due to Range Predicates. Therefore, CURSOR_SHARING was set to FORCE. After this change, the version_count dropped significantly and the benefit can be seen from the statistics pasted below.

Optimizing Performance Part 24

With Multiple Version Counts, Hard Parses were more than 100 per second. After changing from SIMILAR to FORCE, it came down to 60. Further implementation of Bind Variables, in place of Literals, brought this down to less than 10.

Production Statistics, post optimization


Optimizing Performance Part 25

In the next blog, I would write on another test case alongwith a real life example on the issue due to Improper Implementation of an Oracle feature that caused a severe Row Cache Latch Contention. This discussion was also a part of my Day 2 presentation at AIOUG.

Library Cache : Causes of Multiple Version Count for an SQL (Day 1 of AIOUG Event)

I am just back from the 3 day technical event at Bangalore, India. This event gave me an oppurtunity to meet and speak to a legend, and one of my Guru, Thomas Kyte. His presentations were remarkable. This event also gave me an oppurtunity to meet many of my readers. My purpose of initiating blogs (I started in Dec 2006) was to educate Oracle Community and It is very satisfying to know that, whatever you write, reaches the Community. Many readers were disappointed since my previous blog url is still inaccessible, and I had to assure them, that all those posts, would be re-posted on this blog. This means, in India, I too have a good fan following. 🙂

I had three sessions span across each of the three days, and Day 1 of the event, I discussed mostly on Library Cache and some of the issues that have an impact on performance, causing Library Cache related Latch Contention. As usual, to prove the point, I demonstrated each of these with some scripts executed on my 10.2.0.1 database. Day 2, also began with a brief discussion on this issue and I demonstrated some more facts, including an issue due to “Improper Implementation of Virtual Private Database”. This post is a detailed discussion on what I presented and demonstrated at the Event on Day 1, as it gives a brief understanding of Parent & Child Cursors. Day 2, I started with the same topic and my next few blogs will cover the test cases and explanations that I present on the second day.

In short, this blog should give you a conceptual knowledge on Library Cache, Parent Cursors, Child Cursors and will also explain about Bind Graduation. Once the concepts of these are cleared, the next blog will feature some more issues that cause creation of multiple child cursors and relate to a Real Life Experience.

We all know that the purpose of Library Cache is to maximize sharing of resources required to execute SQL statements from multiple sessions. In Oracle, these SQL statements are called as CURSORS and therefore, cursors should not be misinterpreted as implicit and explicit cursors defined in pl/sql blocks.This Shareability reduces the memory requirements and the parse time because parsed representations of an SQL may already be stored in the Library Cache. Two SQL statements will not be shared unless their SQL text is completely identical. Any difference in spacing or casing will make two SQL statements be considered different and none of their data structures can be shared. Assume following queries :

  1. select * from emp where ename=:b1;
  2. SELECT * FROM EMP WHERE ENAME=:b1;
  3. select * from emp e where e.ename=:b1;
  4. select * from emp where ename=:b1;

Although the four queries are using Bind Variables, only two of them are sharable, while the other two (2 & 3) will not be shared. The shared representation of a SQL cursor is stored in two parts: the Parent Cursor and the Child Cursor.

PARENT CURSOR

It stores the name i.e. the SQL text of the Cursor. When two statements are identical textually, they will share a same Parent Cursor. As an example, query no.1 and 4 above, will share a same Parent Cursor.

CHILD CURSOR

The Child Cursor holds other required information, like: the identity of the objects referenced by the SQL Cursor; the names, type and length of the bind variables used; the session value of the session parameters that have an impact on the optimizer decisions and the execution plan of the Cursor.

What this means is, two textually similar statements, sharing same parent cursor, will also share a same child cursor if they share each of the above mentioned information. This also infers that a parent cursor, will have atleast (minimum) one or more than one child cursor. The ratio is 1:1 (one parent : one child) or 1:M (one parent : multiple child).

The demonstration in this blog are all the same that I presented during the event and are run on 10.2.0.1 database. The output of some of the queries may differ in 10.2.0.2 and above, but the issue remains same.

Number of Library Cache CHILD Latches
SQL> show parameter cpu


SQL> show parameter cpu

NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
cpu_count                            integer                          24

select count(*) from v$latch_children where name='library cache';

  COUNT(*)
----------
        29

I executed this query on a production system with 24 CPU, and the number of Child Latches, automatically pre-decided and allocated at run time by Oracle kernel, was set to next smallest prime number to the number of CPU’s and can go upto 67 max. Therefore, even if I have a 80 CPU machine, the number of Child Library Cache latches would be limited to 67. Library Cache is a parent Latch with multiple child latches. In this case, we have One Parent Library Cache Latch and 29 Child Latches.

DEMONSTRATION OF 1:1 ratio

I will execute 4 queries out of them, 3 will be on EMP table and 1 on DEPT. While a query on DEPT will fetch different resultset, the other three will fetch the same resultset, but, will share different parent cursors. Therefore, even though, the execution plans of these three are same (FTS on EMP), these will be mapped to different parent cursor as these do not match textually. Also note that these 4 hashed (internal algorithm) to different child latches.

  1. select /*+ VIVEK */ * from emp;
  2. select /*+ VIVEK */ * from dept;
  3. SELECT /*+ VIVEK */ * FROM EMP;
  4. select /*+ VIVEK */ * from emp e;

Dynamic view V$SQLAREA maintains single row per parent cursor, whereas, V$SQL maintains each row for a Child Cursor. One the four queries, mentioned above, were executed, the output form V$SQLAREA and V$SQL was as under :

SQL> column sql_text for a40 wrap
SQL> set lines 132
SQL> select sql_id, sql_text, executions, child_latch from v$sqlarea where upper(sql_text) like 'SELECT /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH
------------- ---------------------------------------- ---------- -----------
bsmb6z005wgyr select /*+ VIVEK */ * from emp e                  1           1
c9qspg124hjk4 SELECT /*+ VIVEK */ * FROM EMP                    1           3
00fdja7qcpswr select /*+ VIVEK */ * from dept                   1           3
2p65wk92f3vpq select /*+ VIVEK */ * from emp                    1           2

SQL> select sql_id, sql_text, executions, child_latch from v$sql where upper(sql_text) like 'SELECT /*+ VIVEK%';

SQL_ID        SQL_TEXT                                 EXECUTIONS CHILD_LATCH
------------- ---------------------------------------- ---------- -----------
bsmb6z005wgyr select /*+ VIVEK */ * from emp e                  1           1
c9qspg124hjk4 SELECT /*+ VIVEK */ * FROM EMP                    1           3
00fdja7qcpswr select /*+ VIVEK */ * from dept                   1           3
2p65wk92f3vpq select /*+ VIVEK */ * from emp                    1           2

This demonstration also shows that after we executed these queries, a child cursor automatically, these can be mapped from SQL_ID. For each SQL_ID in V$SQLAREA, we have a corresponding row in V$SQL (1:1).

DEMONSTRATION OF 1:M ratio

While there are multiple reasons for the scenario where 1 Parent Cursor can have Multiple Child Cursors, I demostrated few of them and these were :

  1. Optimizer & Bind Mismatch – DAY 1
  2. Bind Graduation (Different Bind Length) – DAY 1
  3. Cursor_Sharing=SIMILAR (further divided into two scenarios) – DAY 2
  4. Improper Implemention of Oracle features VPD – DAY 2

While this blog covers DAY 1, the third scenario will feature in my next blog, and the fourth on VPD will follows after that. Day 2 Blogs relates to some of my Real Life Challenges and I will relate these to those experiences.

OPTIMIZER & BIND MISMATCH

We all know and understand the importance of bind variables. Bind Variables are crucial in a Transaction processing (OLTP) databases, and one of the primary responsilibility of an Application Developer is to make sure that application queries make use of Bind Variables.

This demonstration uses Bind Variables, therefore, during the presentation, one of the Attendee asked me a question (during Q&A session) that since we have too many child cursors with bind Implementation, does it mean that we need to change these to Literals ? Please note that, Literals are very bad for Transaction Processing System and this test case demonstrates an issue, not due to Bind Variables, but the way these are either implemented or some other settings that make a sharable parent cursor partially sharable.

In this test case, I will execute textually similar query, with different Bind Length and Optimizer Settings. Since the queries are textually same, they will be mapped to a single SQL_ID and to a same child latch, but will have multiple child cursors (version_count).

FLUSHING OF SHARED_POOL IS NOT RECOMMENDED ON PRODUCTION. THIS IS A TEST INSTANCE, I AM FLUSHING TO DEMONSTRATE THE EXECUTIONS AND VERSION_COUNT THAT MATCH THE TEST CASE.

SQL> alter system flush shared_pool;

System altered.

SQL> connect x/x
Connected.
SQL> variable b1 varchar2(10);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> REM Same Query with Different Optimizer Settings
SQL> alter session set optimizer_index_cost_adj=10;

Session altered.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> connect x/x
Connected.
SQL> REM Optimizer Setting Default
SQL> variable b1 varchar2(40);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> connect y/y
Connected.
SQL> select * from tab;

TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
EMP                            TABLE

SQL> variable b1 varchar2(10);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> connect vivek/vivek
Connected.
SQL> column sql_text for a30 wrap
SQL> set lines 132
SQL> select sql_id, sql_text, executions, child_latch, version_count from v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS CHILD_LATCH VERSION_COUNT
------------- ------------------------------ ---------- ----------- -------------
f2q4bgycc2b1y select /*+ VIVEK */ * from emp          4           1             4
               where ename=:b1

SQL> select sql_id, sql_text, executions, plan_hash_value, child_latch from v$sql where sql_text lik
e 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS PLAN_HASH_VALUE CHILD_LATCH
------------- ------------------------------ ---------- --------------- -----------
f2q4bgycc2b1y select /*+ VIVEK */ * from emp          1      3956160932           1
               where ename=:b1

f2q4bgycc2b1y select /*+ VIVEK */ * from emp          1      3956160932           1
               where ename=:b1

f2q4bgycc2b1y select /*+ VIVEK */ * from emp          1      3956160932           1
               where ename=:b1

f2q4bgycc2b1y select /*+ VIVEK */ * from emp          1      3956160932           1
               where ename=:b1

SQL> select sql_id, address, child_address, optimizer_mismatch, bind_mismatch, translation_mismatch
  2  from v$sql_shared_cursor where sql_id='f2q4bgycc2b1y';

SQL_ID        ADDRESS  CHILD_AD O B T
------------- -------- -------- - - -
f2q4bgycc2b1y 6A2CF72C 69334384 N N N
f2q4bgycc2b1y 6A2CF72C 6A25D5DC Y N N
f2q4bgycc2b1y 6A2CF72C 684F5684 Y Y N
f2q4bgycc2b1y 6A2CF72C 69144D8C Y N Y

One Parent Cursor, check the SQL_ID and ADDRESS from V$SQLAREA & V$SQL, multiple Child Cursors with Child_Number in V$SQL. The CHILD_ADDRESS from V$SQL can be mapped to CHILD_ADDRESS of V$SQL_SHARED_CURSOR to get the reason for the mismatch between already stored child cursor.

In this test case, we executed 4 queries. While each of these were textually similar, they shared same library cache parent cursor. When the first query was executed, it created a parent cursor and a child cursor. When second query was executed, as it was textually similar, it shared a same parent cursor but, since optimizer_index_cost_adj was set to a non-default value, optimizer assumed that this setting might require another plan and creates a new child cursor. Another query was again textually same, but was executed on a different object, schema Y and the fourth one, was with a modified bind length. In this case, while optimizer assumed that the plan would be different, it created a new child cursor, but if you look at the PLAN_HASH_VALUE form V$SQL, the plan for each of these is same i.e. FTS.

Now, for the another test on BIND GRADUATION. In order to limit the creation of number of child cursors, Oracle limits and maps the size of a Bind Length (for VARCHAR2 only) to a maximum limit. Lets execute the queries with different bind length and check the number of child cursors for each of these.

SQL> alter system flush shared_pool;

System altered.

SQL> variable b1 varchar2(10);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> variable b1 varchar2(30);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> variable b1 varchar2(40);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> variable b1 varchar2(80);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> variable b1 varchar2(140);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> variable b1 varchar2(2040);
SQL> exec :b1:='KING';

PL/SQL procedure successfully completed.

SQL> select /*+ VIVEK */ * from emp where ename=:b1;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7839 KING       PRESIDENT            17-NOV-81       5000                    10

SQL> column sql_text for a30 wrap
SQL> set lines 132
SQL> select sql_id, sql_text, executions, child_latch, version_count from v$sqlarea where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS CHILD_LATCH VERSION_COUNT
------------- ------------------------------ ---------- ----------- -------------
f2q4bgycc2b1y select /*+ VIVEK */ * from emp          6           1             4
               where ename=:b1

SQL> select sql_id, sql_text, executions, child_latch from v$sql where sql_text like 'select /*+ VIVEK%';

SQL_ID        SQL_TEXT                       EXECUTIONS CHILD_LATCH
------------- ------------------------------ ---------- -----------
f2q4bgycc2b1y select /*+ VIVEK */ * from emp          2           1
               where ename=:b1

f2q4bgycc2b1y select /*+ VIVEK */ * from emp          2           1
               where ename=:b1

f2q4bgycc2b1y select /*+ VIVEK */ * from emp          1           1
               where ename=:b1

f2q4bgycc2b1y select /*+ VIVEK */ * from emp          1           1
               where ename=:b1


SQL> select * from v$sql_shared_cursor where sql_id='f2q4bgycc2b1y';

SQL_ID        ADDRESS  CHILD_AD B
------------- -------- -------- -
f2q4bgycc2b1y 6A15BC84 694FF7E8 N
f2q4bgycc2b1y 6A15BC84 68768DC8 Y
f2q4bgycc2b1y 6A15BC84 6A258CF8 Y
f2q4bgycc2b1y 6A15BC84 69BA5E00 Y

SQL> select address, position, datatype, max_length, bind_name
  2  from v$sql_bind_metadata
  3  where address in
  4  (select child_address from v$sql where sql_id='f2q4bgycc2b1y');

ADDRESS    POSITION   DATATYPE MAX_LENGTH BIND_NAME
-------- ---------- ---------- ---------- ------------------------------
694FF7E8          1          1         32 B1
68768DC8          1          1        128 B1
6A258CF8          1          1       2000 B1
69BA5E00          1          1       4000 B1

In this case, I executed the queries with different bind length. For six executions, there were 4 child cursors and the bind graduation limited these to the maximum value as exposed by V$SQL_BIND_METADATA.

This was the end of DAY 1 of my presentation and I tried to give a conceptual knowledge on CURSORS. More on this in my next 2 blogs.

%d bloggers like this: