parse count (failures) – How do I find queries causing failures ?

In my last blog on Hard Parses and failures posted on September 3, 2011, I missed to write about the ways to find the Queries causing “parse count (failures)”. I received few mails from some of readers wanting to know on how to get these. Since these queries fail, these are not stored in the Shared Pool and therefore, it is not possible to get these from v$ views. Secondly, since these are not stored in the Shared Pool, everytime these queries are executed, these are hard parsed, thus causing unwanted contention on Library Cache latches.

A possible way to get these is 10046 traces. I used these traces to get the Queries that were causing these failures to occur. Yesterday, I was working at another customer site, where I could see some failures happening. These are again around 25% to 30% of the total Hard Parses. Therefore, if we eliminate these, total hard parses / second would come down. Following query helps me check, whether the statistics related to hard parses are alarming.

set linesize 132
alter session set nls_date_format='DD-MM-YYYY HH24:MI:SS';
select sysdate, name, value from v$sysstat where name like 'parse%';
exec dbms_lock.sleep(10);
select sysdate, name, value from v$sysstat where name like 'parse%';

I execute a query against v$sysstat, wait for 10 seconds and re-execute the same. Therefore, the incremental figure is for 10 seconds duration. From this, it is very easy to compute the parses/second ratio. Following is the output of the query taken yesterday :

SQL> @hard_parse

Session altered.

Elapsed: 00:00:00.00

SYSDATE             NAME                                                                  VALUE
------------------- ---------------------------------------------------------------- ----------
20-09-2011 10:28:14 parse time cpu                                                       117213
20-09-2011 10:28:14 parse time elapsed                                                   390923
20-09-2011 10:28:14 parse count (total)                                                54294681
20-09-2011 10:28:14 parse count (hard)                                                   766713
20-09-2011 10:28:14 parse count (failures)                                               177555
20-09-2011 10:28:14 parse count (describe)                                                  204

6 rows selected.

Elapsed: 00:00:00.03

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.00

SYSDATE             NAME                                                                  VALUE
------------------- ---------------------------------------------------------------- ----------
20-09-2011 10:28:24 parse time cpu                                                       117216
20-09-2011 10:28:24 parse time elapsed                                                   390969
20-09-2011 10:28:24 parse count (total)                                                54303954
20-09-2011 10:28:24 parse count (hard)                                                   767011
20-09-2011 10:28:24 parse count (failures)                                               177637
20-09-2011 10:28:24 parse count (describe)                                                  204

6 rows selected.

Elapsed: 00:00:00.02
SQL> select 767011-766713, 177637-177555 from dual;

767011-766713 177637-177555
------------- -------------
          298            82

In this case, the Hard Parses are 29.8 per second and failures are 8.2, which is around 28% of the total hard parses. To check for the queries causing these, I generated enabled 10046 trace at system level, during less concurrency time (non-peak hours), and issued grep -i “PARSE ERROR” *.trc. This gave me the name of the trace files and from these traces, it was very easy to figure out the queries that were failing. In the trace file, search for PARSE ERROR and you will get the query, alongwith the reason for its failure, which is ora-942 (table or view does not exists), as can be seen in err column.

grep -i "PARSE ERROR" *.trc
RAC1_ora_1966688.trc:PARSE ERROR #1:len=157 dep=0 uid=38 oct=3 lid=38 tim=2611876001420 err=942
RAC1_ora_1966688.trc:PARSE ERROR #1:len=157 dep=0 uid=38 oct=3 lid=38 tim=2611876029670 err=942
RAC1_ora_1966688.trc:PARSE ERROR #5:len=95 dep=1 uid=0 oct=3 lid=0 tim=2611876430154 err=942

Do we need to take care of Hard Parses only when Library Cache related Latches are on top ?

I just concluded a 3 months Production Optimization Project. The problem summary was an overall slowness of the production system, which is a three tier architecture. The Production System includes a 2 Node Database with 4 Application Servers configured with Connection Pool. As with all my Performance Project, this time too I started off with a one round of discussion with all the stake holders. These stake holders manage the production system and were called upon to express their opinion on the overall slowness. While everyone voiced together that the slowness is consistent the entire day, they also mentioned that they are confident that the issue is with the Oracle Application Server and the way it manages the Connection Pool. The rationale behind this assumption was the Connection Pool Statistics, which says that as the load increases, the number of Connection Pools configured increase drastically and are not freed until the load comes down. The peak starts at 11 am and, since the production system is slow, the load comes down by 9 pm. The business hours of the customer is 9 am to 5 pm.

Initial investigation revealed that with Automatic Shared Memory Management enabled, the Shared Pool had grown upto 40GB. This is pretty high. I have not seen any configuration that requires such a huge Shared Pool. This was surely a cause of concern and therefore, this was further drilled down to one of a serious bug in the application. This was Hard Parse.

An Application or Database flooded with Unshared Application queries is considered as an Unscalable Application. Slowly or Gradually, as the User Load increase, the performance issues are bound to come. In this case, the Hard Parse / Second ratio was around 180 per second. While there were many queries that were using Literals, the most surprising part was the Hard Parse failures, which were almost 68% of the total Hard Parses. Therefore, once a patch to fix the queries that were using Literals was applied, there was no significant improvement.

The most challenging part of this assignment was the reluctance from the Application Team to accept the Hard Parses or the Parse failure to be an Issue. The debate from the Application Team was that reducing the Hard Parses would not yield any benefit as they do not see any Library Cache related latches on top. They have been taught to see the AWR reports and look to the top events. Reducing the number of waits of the top most component seemed to be on top of the list for Application team. While this is true, to some extent, but in case of Hard Parses, this would not work.

This is the reason, the title of my blog is “Do we really need to take care of Hard Parses only when Library Cache related Latches are on top ?”. In this case, the Hard Parses can be easily related to the contention that we observed on the Connection Pool. Usually, the response time of a Connection Pool or a Shared Server Architecture should be less than a Second. If this goes high, we might see a slowness or contention on Connection Pool, as other users may have to wait for the pool to be released. This again explains, why the number of connections pools increased as the load increases.

Hard Parses are a performance killer, also called as a Silent Killer. Many a times, Hard Parses does not cause significant waits on Library Cache latches. It depends on the resources available at the Database Server. In this case, it was a 2 Node RAC with 56 CPU’s on each machine. The Hardware configuration was enough and therefore, the contention was not that significant to cause Latch Free waits. As we all know, Hard Parse takes more time than a Soft Parse or a No Parse and this resulted in a signifcant waits or contention on Connection Pool. Also, as mentioned above, Hard Parses are Silent Killer. With that said, if these are not taken care of at a initial stage, this can bring down the Production System (as it did in the case of my Customer).

To summarize, Hard Parses are to be addressed even if these are not visible as a Latch Contention. In case of my customer, as soon as the Hard Parses and Parse failures were fixed, the performance of the production system improved drastically and was consistent through out the day. The users were able to enter their transactions in the business hours. Further, the contention on the connection pool came down as well.

%d bloggers like this: