“Library Cache : Mutex X” ! An Interesting Optimization

One of my Customer is migrating from an Non-Oracle Database to Oracle Database and this being a critical Application, the migration is planned in a phased manner. Migration happens every fortnight (a city or set of cities at a time) and is enough to increase the load as a Vertical line (if plotted in a graph). In a simple term, the increase in the load is not linear. With such a dramatic increase, some or the other issues can be expected and this is what triggered post one such migration. The setup is a 2 Node RAC with Database version is 11.2.0.2.8. Connection Pooling is used at the Apps Layer.

During one of the migration the customer increased CPU’s from 80 to 148 on each of the Node. This was based on the recommendation from the Application Team. Immediately, post the migration the top most wait event was “library cache : mutex x”. While this was not impacting the performance but was a cause of concern as the migration was only 40% and next migration would add the load further by 20%.

We had two dramatic changes. These are : (1) The CPU’s Increased from 80 to 148, and (2) The load increased many fold because of the migration of new branches. It was critical to investigate the issue and therefore, we started monitoring the system during peak load. An interesting observation that came out was a function, which was on top, most of the time. It was BEGIN :1 := PA_MC_ENTITY_CONTEXT_FN(:2); END;. The number of executions of this function were also around 600+ per second. For more understanding on this function, we scheduled a discussion with the application team and the outcome of this discussion was very interesting.

During the discussion, it came out that the application has a functionality to use Virtual Private Database (VPD) and this function is used to set the application context, which is then used by each of the application queries to filter out the rows that do not match the context. Usually, VPD is used as a row level security, which is implemented by way of a function to generate WHERE predicate depending upon the privileges or profile of the application user. These predicates can be different for different set of users. All this is generated at run time. Interestingly, in this case, the application context is not generated dynamically based on privileges, but is set to a fix value, ’4000′, for all the database session. Technically, there was nothing wrong in the implementation and therefore, it was not important to drill down further to relate the link between this function and mutex waits. The VPD Implementation and function code all looked fine

The interesting fact that we came to know was the way this function was called. Further investigation and discussion with the Application team revealed that for each database call, this function is called to set the application context. This means, if a session executes 5 queries, as a part of a transaction, the function to set the application context will be executed 5 times. As per them, the reason behind this is that as the Apps Layer use Connection Pool, there is no way to know whether the connection picked up by the application user is one of the existing connection from the pool or a newly spawned (once all the initial connections are exhausted). Clearly, executing a function for each database calls and setting a context on each call seemed to be a problem, as for each execution of this function, the mutex was requested in an exclusive mode to write to a memory location.

As an immediate optimization, in order to reduce the number of execution and exclusive gets, it was suggested to check for the existence of application context before setting the context. This was achieved by way of a query to first check whether the context is set to desired value. If it is, this means, it is one of existing connection and no need to set the application context. If not, this means, this is a new connection and therefore, the application context is set to ’4000′. With this implementation, the number of executions of the function reduced drastically, thereby reducing the waits on Library Cache Mutex.

In this case, everyone suspected this to be an Oracle Database BUG. In order to eliminate or reduce the contention, it is imperative to optimize the amount of work done. Unwanted work is always an overhead, as was the case in this optimization.

About these ads

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. These are all my real life examples, which I hope, you would find interesting. Comments are always a welcome.

7 Responses to “Library Cache : Mutex X” ! An Interesting Optimization

  1. vijay sehgal says:

    Vivek,

    As always interesting post.

  2. Prasad says:

    Interesting…Nice post Vivek.

    Regards,
    Prasad

  3. Ashish says:

    Hi Vivek, proxy connection coupled with session multiplexing could be another way of implementing the connection pool. The application code, however may have to be modified.

  4. Onkar says:

    a good one. We also had similar issue. Undoubtedly using something not really needed always creates problem.

  5. Alok says:

    Most app developers tend to do the following:
    db_query=select (the minimal select) that can be executed
    fetch_array(db_query)
    now work through the array of the result
    versus writing a complicated sql which sorts/fetches the data in the sql itself.

    The reason is that once the data is locally available in an array, the app server can play around with that array (malloc is done for the array fetch) and not interfere with regular db operations.

    Why would people tend to use “complicated sql queries” in an era where it is easier to sort/play with data on a separate machine where the data is already memory mounted via an array handle?

    • Alok says:

      Of course having an index will make the above less efficient but there is never a guarantee that every column will have an index from a developers view point. My general question is parallel or no parallel execution (and I believe exadatav2 does away with the parallel clause and defaults to parallel), what does one loose buy letting the app side play with the data instead of complex queries? Real time data fetching apart, the MIS/reporting side which extensively relies on point in time data does not need to be written with such complex SQL queries

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 125 other followers