Cache Buffers Chains Latch Contention…..

Few days back, I got a call from one of my customer about a severe latch contention issue that had been forcing them to restart a database. This phenomenon had been occurring for last past 1 month and for every occurrence the database had to be restarted. This being a critical 24×7 system, for obvious reason, restart was not a solution and when the inhouse dba team gave up on this, I was called upon to investigate the issue.

The database version is 9205 and seeing the statspack during the bad times, it was evident that the contention was on Cache Buffers Chains Latch. In most of the cases, CBC Latch contention come due to inefficient application queries with these consuming huge logical reads per execution. In this case too, as evident from the Statspack, these huge logical reads queries were the primary reason. There were many queries with Millions of Logical Reads and optimizing these would surely give relief as the amount of latching while acquiring block in the cache will come down. While some of the information was available in the Statspack report, I queried x$bh view to check for the block distribution across each CBC Child Latch. For simplicity of the readers, X$BH is an Oracle Internal table that maintains the history of the buffers in the Cache alongwith the hash chains these are allocated to. Usually, the distribution of the blocks or buffers in the cache should be approximately uniform across each hash chain. The Query executed to get the distribution alongwith the output is as under :

This Output is when the system was running perfectly fine without any latch contention.

select * from (
select hladdr, count(*) from x$bh group by hladdr order by 2 desc)
where rownum<=20;

HLADDR             COUNT(*)
---------------- ----------
07000003E2771BA8        197
07000003E661DE70        161
07000003E6305250        157
07000003E661CE90        157
07000003E3F2B0E8        154
07000003E199FA00        153
07000003E4E45FF0        153
07000003E6607150        153
07000003E76A42D0        152
07000003E19A19C0        151
07000003E3DB9F68        151
07000003E5A0F920        151
07000003E78C9D90        151
07000003E76A2310        151
07000003E64E45F0        151
07000003E1A07CE0        150
07000003E4CD7E10        150
07000003E4E45010        150
07000003E4DBA190        150
07000003E3F30048        150

20 rows selected.

As seen from the Good Time output, the buffers protected by each hash chains are nearly uniform. Fortunately, when I was monitoring the system for 2-3 days, I could capture these buffer distribution across each hash chains during Latch Contention and the figures for the top most hash chain looked alarming. The Output captured during the issue is as under :

HLADDR             COUNT(*)
---------------- ----------
07000002570F7300       8734
070000025A786078        364
070000025A5020D8        167
0700000262E9A9E0        167
07000002562DD4D0        164
07000002583D9C00        164
07000002563162C0        161
07000002563BE270        161
07000002571D9AF0        161
0700000262E9C420        161
0700000256341080        161
07000002572C8650        160
07000002583D74A0        160
070000025A838098        160
0700000259557A20        160
07000002562DA010        159
07000002571E2B50        159
07000002582778A0        159
070000025955D600        154
07000002584C1140        151

20 rows selected.

This clearly indicates Hot Block Contention, which means, a block that several processes are trying to access simultaneously. A Long Hash Chain means that a query or set of queries are creating a consistent read image of a particular block and since, a block, even if it has multiple images, will be hashed to a same child latch, a hash chain can grow longer causing latch contention. Next step was to Capture Data relevant to the current issue and therefore the data collected for further investigation was :

  1. Top Blocks / Objects with high Touch Count
  2. Top Running Queries with high Buffer Gets

The First step was to identify the block, and the object it belongs too, with highest Touch Count. This counter is maintained in TCH column of X$BH. The Query used to get this detail is as under :

select a.hladdr, a.file#, a.dbablk, a.tch, a.obj, b.object_name
from   	x$bh a, 
	dba_objects b
where  	(a.obj = b.object_id  or  a.obj = b.data_object_id)
and 	hladdr='07000002570F7300' 
order by 4;

From the output (not mentioned in this blog), I could see two objects S_BRANCH and BA_TMP_HOFF as the top most object. While BA_TMP_HOFF (normal heap table) was contributing to the latch contention with highest touch count, the sessions were waiting on S_BRANCH table due to high concurrency on this table.

The second step was to capture the top running queries with high buffer gets per execution and some of these are mentioned below (I have mentioned only few of them and have cut short the text) :

HASH_VALUE SQL_TEXT                                           EXECUTIONS GETS_PER_EXEC
---------- -------------------------------------------------- ---------- -------------
 603848796 SELECT    FN_NEWBRANCH(A.BRANCH) BRANCH,    A.PROD          1       5294194
 784509366 SELECT    FN_NEWBRANCH(C.BRANCH) BRANCH,    C.PROD          8       2306090
           R_REF_NO,    B.PARTY_ID DRAWER_ID,       B.PARTY_N
 322980110 SELECT    FN_NEWBRANCH(CSTBS_CONTRACT.BRANCH),              3        811376


Further, once the Contending Object (S_BRANCH) and Contributor (BA_TMP_HOFF) was found, during the discussion with the Owner of this Production System it was revealed that the load (number of records processed) by a batch processes that is run in 10 parallel streams has gone up by 3 times and all these batch processes heavily access (select and update) BA_TMP_HOFF table.

During the latch contention, almost 95% of the sessions waiting on CBC Latches query S_BRANCH. The application owner further mentioned that, while this table is static, it is heavily queried as it is a BRANCH Master table and is joined to several other tables to get the Name of the Branch. This input was interesting and is a heart of the this blog. Interestingly, the queries captured for the waiting sessions during the latch free waits were of exactly same type i.e. get the branch name from S_BRANCH and were doing 4 Logical I/O’s per executions using a Table Scan via Primary Key Index. Since these were doing only 4 logical I/O’s per execution, these queries were not on top of the list capture above. The top queries captured were based on the Gets Per Executions.


It was time implement some of the changes that would give the customer relief from the ongoing issues. WHile optimization of the Batch Processes involving BA_TMP_HOFF table would have required Understanding the Business Logic, Making Changes, Testing and Implementing and would have taken longer, the emphasis was more to implement a change that does not require extensive testing and therefore may take less time. Therefore, the change that would have given some relief was optimizing Queries on S_BRANCH table. The Concern raised by the Application Owner here was “Will Optimization of a Query with 4 Logical I/O’s per Execution give relief from Latch Contention ?”

To resolve the Latch Contention, my focus was not on reducing the I/O’s of the Queries on S_BRANCH from 4 to lesser but was to reduce the concurrency on this object by way of optimization. If you see the top queries captured above (step 2 of the data collected during the latch contention), these queries call a Function FN_NEWBRANCH. The input to this function is Branch Code from child tables and the output is the branch name fetched from S_BRANCH. A Function, and the queries written in the Function, called from the SELECT Statement is executed once for each row fetched by the main body of the query. This means, if the SELECT Query fetches 1000 rows, then the function called in the SELECT Statement, as well as, the queries written in the function will be executed 1000 times. We all know that Latch Contention is a Concurrency Issue and reducing Concurrency by way of Optimization, results in Scalability of the Application. Therefore, my Goal here was to reduce the concurrency on S_BRANCH table and this was done by way of a Simple Change in the query. Since the change was very simple change, it did not require any Application Testing and was therefore immediately Implemented. Since then, it has been almost 20 days, the performance of the production system is stable with No Latch Contention.


Let us walk through an example that simulates the production concurrency issue. In this test case, I will create two tables EMP and DEPT. DEPT is a Parent table and EMP is a Child Table that references DEPTNO of DEPT. Further, we will create 2 Functions, both doing the same task, to Get the Department Name for a Deptno and these will be called from a SELECT Statement. Since Deptno is a primary key, a query against DEPT to get DNAME will be of 4 Consistent reads. The text of both the Function is same except the Hint in the Queries to easily identify and differentiate from v$sqlarea.

drop table emp;

drop table dept;

exec dbms_random.seed(0);

create table dept as
select 	rownum deptno, dbms_random.string('A',20) dname
from	all_objects
where 	rownum<=100;

alter table dept add constraint pk_dept primary key(deptno);

create table emp as
select	rownum empno, dbms_random.string('A',30) ename,
	round(dbms_random.value(1,100),0) deptno,
	round(dbms_random.value(1000,10000),0) sal
from	all_objects
where 	rownum<=10000;

alter table emp add constraint pk_emp primary key(empno);

alter table emp add constraint deptno_fk foreign key(deptno) references dept;

exec dbms_stats.gather_table_stats(user,'EMP');

exec dbms_stats.gather_table_stats(user,'DEPT');
create or replace function get_dname(i_deptno in number)
return varchar2 as
i_name varchar2(20);
select /*+ ORIGINAL */ dname into i_name from dept where deptno=i_deptno;
return i_name;

create or replace function get_dname_o(i_deptno in number)
return varchar2 as
i_name varchar2(20);
select /*+ OPTIMIZED */ dname into i_name from dept where deptno=i_deptno;
return i_name;

set autot trace
select get_dname(deptno) deptno,
        ename, sal
from       emp
where deptno in (10,20);

200 rows selected.

        505  consistent gets <--- I/O's of the Main Body
        200  rows processed <-- Rows Processed 

Optimized Code (calling of Function using Scalar Subquery)

select (select get_dname_o(deptno) from dual) deptno,
	ename, sal
from	emp
where 	deptno in (10,20);

200 rows selected.

         21  recursive calls
          0  db block gets
        108  consistent gets <-- I/O's of the Main Body
        200  rows processed <-- Rows Processed (same as Original)

column sql_text for a50 wrap
select sql_id, sql_text, buffer_gets, executions, version_count
from v$sqlarea where sql_text like 'select get_dname(dept%';

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT
------------- -------------------------------------------------- ----------- ---------- -------------
90wzvddty11u6 select get_dname(deptno) deptno,  ename, sal from          505          1             1
              emp where deptno in (10,20)

select sql_id, sql_text, buffer_gets, executions, version_count
from v$sqlarea where sql_text like 'SELECT /*+ ORIGINAL%';

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT
------------- -------------------------------------------------- ----------- ---------- -------------
dpbdkb6u525g3 SELECT /*+ ORIGINAL */ DNAME FROM DEPT WHERE DEPTN         400        200             1

select sql_id, sql_text, buffer_gets, executions, version_count
from v$sqlarea where sql_text like 'select (select get_dname%';

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT
------------- -------------------------------------------------- ----------- ---------- -------------
gufauq4hb8tgq select (select get_dname_o(deptno) from dual) dept         108          1             1
              no,         ename, sal from    emp where deptno in

select sql_id, sql_text, buffer_gets, executions, version_count
from v$sqlarea where sql_text like 'SELECT /*+ OPTIMIZE%';

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT
------------- -------------------------------------------------- ----------- ---------- -------------
14h9s8atq8u26 SELECT /*+ OPTIMIZED */ DNAME FROM DEPT WHERE DEPT           4          2             1

The modification in this case was calling a Function using a SCALAR Subquery using Dual. This change brought down the number of executions of the recursive query “SELECT DNAME FROM DEPT WHERE DEPTNO=:B1” from 200 to 2. This is for each deptno. The Query was executed for Deptno in (10,20) and therefore, the function was called only twice and not for each row fetched by the main body of the query. The Original Implementation called the function 200 times as the main body of the query fetched 200 rows. As for the Logical I/O’s, with both the Implementation The I/O for the recursive query on DEPT was 2 per Execution, but the benefits were on Number of Executions with each of these Implementation. The I/O’s of the Main Query on EMP reduced drastically as the number of times a Query on DEPT reduced drastically and this benefit was passed on to the Main Query.

With this Optimization, the benefits observed were :

  1. Reduction in Logical I/O’s for top queries captured above – This means reduction in Latch Gets.
  2. Reduction in Concurrency against S_BRANCH – This means reduction in Latch Gets.
  3. Improved Scalability

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

9 Responses to Cache Buffers Chains Latch Contention…..

  1. Excellent content sir


  2. Pingback: Importance of DETERMINISTIC Function…in continuation to my blog on CBC Latches « Real Life Database / SQL Experiences : An Oracle Blog from Vivek Sharma

  3. Tony says:

    I’m unable to understand why the LIOs reduced and how calling a function from dual is different. Could shed some light pleasae?


    • Vivek says:


      The Logical I/O’s reduced because the number of execution of the query inside the function reduced from 202 to just 2. If you see the original implementation, out of the 500 Logical I/O’s by the Query, 400 were done alone by the query inside the function. By Optimizing the way a function was called, the I/O’s came down from 400 to 4.

      The benefit of Scalar Subquery “select column from dual” is that the resultset is cached in the private memory of the user and thus, if the scalar subquery is re-executed for a value (in our case detp) that was already fetched by the query then the cached result will be used.



  4. Pingback: CBC Latch Contention – My Experience « Anand's Blog

  5. Onkar says:

    nice post


  6. Mohammed says:

    Hi Vivek,

    Please do not stop writing post of real time experience. I am watching since of couple months new artical has not been posted….

    I request please share your experience specially on complex things where we can learn some thing for our day to day operations.

    Please keep going.. .

    Thanks for your help.




    • Vivek says:

      Hi Mohammed, thanks for your comments and kind words. As the heading of my blog states, I write on my real life experiences. Last 2 months, I feel the technical work that I have done was either not that exciting or repetitive of what I had done and written about in the past. Anyways, will write soon.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s