What has triggered this Sudden Performance Degradation ? A Generic Question…

It is a well known fact that most of the time, a Performance Tuning Consultant is involved for a Reactive Support which is due to a severe performance issue. During such involvement, one of the generic question that a Performance Tuning Expert or Consultant has to answer is “What has triggered this Sudden Performance Degradation ?” or a Comment like “Oracle Optimizer is an Issue as it triggered this Degradation and everything came back to normalcy after reverting the Optimizer Stats”. The challenge for the Consultant is to prove that Optimizer is not an Issue and it is the Application behaviour or Indexing Strategy that has caused this issue. This Strategy acted as a slow poison and then a point reached when it finally gave up. During one of my presentation at All India Oracle User Group, I presented on Developing a Scalable Application and this was one of my slide, which I felt would be of use for my readers as well.

In this blog, I will answer to the Question and Comment mentioned above. This would be one of the case that can trigger a sudden degradation. The test case will be around Optimizer Statistics. I have been asked this question by the Customers and concerns were raised over the way Optimizer works or use these Statistics. Some of the Concerns raised are :

  1. Why do I generate fresh Stats when I am Satisfied by the Application Performance on current Stats ?
  2. I generated Statistics over the weekend and suddenly the performance degraded. Finally, I reverted back the statistics and everything came back to normalcy.
  3. What is the cause of this Sudden Degradation ?

While lot has been written on Concern 1 (the way Optimizer works and the change in Optimizer behaviour in 10g, I will answer Concern 2, and Concern 3 will be automatically answered. As usual, all this will be well supported and demonstrated by way of a Test Case.

SCENARIO

A Customer gathered Statistics on all the production tables. As per the policy, these statistics are gathered every weekend since last couple of years. On Monday, suddenly the Users started complaining about the slowness. The Onsite DBA’s had a look at the system and complained about the I/O issue. Further investigations revealed that the Execution plans for some of the critical application queries has changed from Index Scan to Full Table Scan. What has changed between Friday and Monday? The DBA’s realised that the only change is Optimizer Statistics. In such cases, if it is possible to identify the suspected table and its stats, then it is very easy to revert the stats of this table else the option left out is to revert the entire statistics and bring back the stats of Friday. Both the fallback plan has its own Implications. After reverting the Statistics, the Application Performance comes back to normalcy. Based on the cause of the Issue and the change to get back to normalcy, it is quite evident that the finger of blame would be on Oracle Optimizer. The Customer then decided not to gather fresh statistics unless the Optimizer Issue is resolved.

While there are various reasons that can cause a Slowness post-fresh statistics generation, one of the top most cause is Inefficient Indexing. If the Indexing Strategy of an Application is not optimal, the performance of that application would show a sudden degradation. Usually, Performance drop is not linear. It is always exponential, when the utilization reaches a threshold where it is difficult to handle the extra (minor) growth, this minor growth is enough to cause a sudden degradation.

TEST CASE & EXPLANATION

To Demonstrate the impact of In-efficient Indexing, we need to make certain assumption. In this case, let us assume that the Production System has just gone live and the production tables were without any rows, except the Master Tables. For our demonstration, I will create a table and populate some data (initially 10 days) and run an application query against that table. The table is setup to create 10 Customer ID’s and each of these customers have 5 transactions a day. We will run an application query to get the product_name and sale_value for a Customer for a day. This query should return 5 rows from initial 500 rows (10 Days x 5 Transactions a day x 10 Customers).

Next, as the system become 60 days old, the same application query is run. In our case, even if the volume increase, the number of rows return by the query remains unchanged. We shall see the significant impact of this Increase in Volume on the Performance and this impact was all of a sudden.

Table Creation Script - Inserts 10 days Data, which means the production is 10 days old

exec dbms_random.seed(0);
drop table vivek_test;
create table vivek_test as
with main as
(select mod(rownum,10) custid,
        round(dbms_random.value(1000,7000),0) sale_value,
        dbms_random.string('A',20) product_name
from    all_objects a
where rownum<=50)
select custid, s_date sale_date, sale_value, product_name
from    main, (select sysdate - level s_date from dual connect by level<=10) ;

Next we Create an Index on CustID and Gather Statistics

create index vivek_test_idx on vivek_test(custid);
exec dbms_stats.gather_table_stats(user,'VIVEK_TEST',cascade=>true,method_opt=>'for all columns size 1');

Once this is done, we execute our Application Query to get data for 09th June 2011

variable b1 varchar2(20);
exec :b1:='06-09-2011';

select /*+ vivek */ custid, product_name, sale_value
from  vivek_test
where custid=1
and   sale_date between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60);

    CUSTID PRODUCT_NAME         SALE_VALUE
---------- -------------------- ----------
         1 rMLTDXxxqXOZnqYRJwIn       1380
         1 AOcitwrWNXCxPHzIIIxw       3850
         1 hpalFVIprLxIHDrRgqHb       2287
         1 vkfKdaFoCaHgBCQAFgCY       4364
         1 DAlVYefHAurNoryMikSJ       1125

Runtime Plan

select sql_id, sql_text, buffer_gets, executions, version_count
from v$sqlarea where sql_text like 'select /*+ vivek%';

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT
------------- -------------------------------------------------- ----------- ---------- -------------
2t8mwg94j89h8 select /*+ vivek */ custid, product_name, sale_val          10          2             1
              ue from  vivek_test where custid=1 and   sale_date
               between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and
              to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
SQL_ID  2t8mwg94j89h8, child number 0
-------------------------------------
select /*+ vivek */ custid, product_name, sale_value from  vivek_test where custid=1
and   sale_date between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and
to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60)

Plan hash value: 2623397998

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |       |       |     4 (100)|          |
|*  1 |  FILTER                      |                |       |       |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |     1 |    35 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | VIVEK_TEST_IDX |    50 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_DATE(:B1,'MM-DD-YYYY HH24:MI:SS')=TO_DATE(:B1,'MM-DD-YYYY HH24:MI:SS') AND
              "SALE_DATE"<=TO_DATE(:B1,'MM-DD-YYYY HH24:MI:SS')+1-.0000115740740740740740740740740740
              7407407407))
   3 - access("CUSTID"=1)

The Runtime Plan in this case shows that the Application Query has used an Index on Custid, which is usually an appropriate approach in an OLTP system. The Interesting fact here is a Throwaway, which is, 50 Rows were fetched from an Index and only 1 row was fetched from the Table Access. In this case, 49 rows were THROWNAWAY. The Actual THROWAWAY rows are 45, but again, 45 rows out of 50 rows is 90% of the rows are discarded and this is very bad.

THE REAL PERFORMANCE ISSUE

User Continue to use the system and it is now 60 days old. As per the policy, after the 60th day, fresh Statistics were gathered. Further, the DBA’s also take the backup of the old stats before generating new. As the data is entered, users also continue to execute the application query.

truncate table vivek_test;
exec dbms_random.seed(0);

insert into vivek_test
with main as
(select mod(rownum,10) custid,
        round(dbms_random.value(1000,7000),0) sale_value,
        dbms_random.string('A',20) product_name
from    all_objects a
where rownum<=50)
select custid, s_date sale_date, sale_value, product_name
from    main, (select sysdate - level s_date from dual connect by leveltrue,method_opt=>'for all columns size 1');

exec dbms_stats.create_stat_table(user,'STATS');
exec dbms_stats.export_table_stats(user,'VIVEK_TEST',stattab=>'stats');

variable b1 varchar2(20);
exec :b1:='06-09-2011';

select /*+ vivek */ custid, product_name, sale_value
from  vivek_test
where custid=1
and   sale_date between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60);

Run time Plan
Plan hash value: 2623397998
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |       |       |    18 (100)|          |
|*  1 |  FILTER                      |                |       |       |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |    15 |   525 |    18   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | VIVEK_TEST_IDX |   300 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Check for the Throwaway percentage now, which has gone up considerably. Let us pump in 5 (for 5 working days) days data, and as per the policy, generate fresh statistics. We already have a backup of the previously generated Statistics in STATS table.

truncate table vivek_test;
exec dbms_random.seed(0);

insert into vivek_test
with main as
(select mod(rownum,10) custid,
        round(dbms_random.value(1000,7000),0) sale_value,
        dbms_random.string('A',20) product_name
from    all_objects a
where rownum<=50)
select custid, s_date sale_date, sale_value, product_name
from    main, (select sysdate - level s_date from dual connect by leveltrue,method_opt=>'for all columns size 1');

variable b1 varchar2(20);
exec :b1:='06-09-2011';

select /*+ vivek */ custid, product_name, sale_value
from  vivek_test
where custid=1
and   sale_date between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60);

Run time Plan
Plan hash value: 1579208797

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |       |       |    18 (100)|          |
|*  1 |  FILTER            |            |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| VIVEK_TEST |    15 |   525 |    18   (0)| 00:00:01 |
---------------------------------------------------------------------------------

In just 4-5 days, the plan of the query changed from Index Scan to Full Table Scan. Further, the I/O’s of the query also increased from 10 (when it started) to 22 and then now to 63 per execution. The Cost of the Query remains same. In high concurrency, these Full table Scan’s would surely have an impact on the overall performance. Let us now revert the statistics from the backup table and check for the query performance.

Current Stats
new   2: where table_name='VIVEK_TEST'

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
VIVEK                          NO        3250         60 13-JUN-11 YES

Elapsed: 00:00:00.45
SQL> exec dbms_stats.import_table_stats(user,'VIVEK_TEST',stattab=>'STATS');

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.04

Reverted Stats
OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
VIVEK                          NO        3000         60 13-JUN-11 YES

variable b1 varchar2(20);
exec :b1:='06-09-2011';

select /*+ vivek */ custid, product_name, sale_value
from  vivek_test
where custid=1
and   sale_date between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60);

Runtime Plan
Plan hash value: 2623397998

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |       |       |    18 (100)|          |
|*  1 |  FILTER                      |                |       |       |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |    15 |   525 |    18   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | VIVEK_TEST_IDX |   300 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

After the Statistics were reverted, the application query regenerated the Better plan. This was enough to raise a doubt over the behaviour of Optimizer. On my test database, the actual issue occured on the 61st day. In your case, this may not be the case. For this example, I will also demonstrate you a way to find out on which day the Optimizer will change the plan.

The test case and demonstration clearly explains that the issue here is not because of the Optimizer Statistics, but is because of Inefficient Indexing, which is also evident from the Throwaway percentage mentioned above. To prove this point, I will create a composite index on Custid, sale_date. I will generate a fresh statistics, which will for 65 days. Once the Application query is run, I will pump in 365 days worth of data, regenerate fresh statistics and check for the Execution Plan (and throwaway percentage).

drop index vivek_test_idx;
create index vivek_test_idx on vivek_test(custid, sale_date);
exec dbms_stats.gather_table_stats(user,'VIVEK_TEST',cascade=>true,method_opt=>'for all columns size 1');
variable b1 varchar2(20);
exec :b1:='06-09-2011';

select /*+ vivek */ custid, product_name, sale_value
from  vivek_test
where custid=1
and   sale_date between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60);

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT
------------- -------------------------------------------------- ----------- ---------- -------------
2t8mwg94j89h8 select /*+ vivek */ custid, product_name, sale_val           6          1             1
              ue from  vivek_test where custid=1 and   sale_date
               between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and
              to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60
              )

Plan hash value: 2623397998
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |       |       |     5 (100)|          |
|*  1 |  FILTER                      |                |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |    15 |   525 |     5   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | VIVEK_TEST_IDX |    16 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

The Production is now a year old, with new Index
truncate table vivek_test;
exec dbms_random.seed(0);

insert into vivek_test
with main as
(select mod(rownum,10) custid,
        round(dbms_random.value(1000,7000),0) sale_value,
        dbms_random.string('A',20) product_name
from    all_objects a
where rownum<=50)
select custid, s_date sale_date, sale_value, product_name
from    main, (select sysdate - level s_date from dual connect by leveltrue,method_opt=>'for all columns size 1');

select /*+ vivek */ custid, product_name, sale_value
from  vivek_test
where custid=1
and   sale_date between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60);

SQL_ID        SQL_TEXT                                           BUFFER_GETS EXECUTIONS VERSION_COUNT
------------- -------------------------------------------------- ----------- ---------- -------------
2t8mwg94j89h8 select /*+ vivek */ custid, product_name, sale_val           6          1             1
              ue from  vivek_test where custid=1 and   sale_date
               between to_date(:b1,'MM-DD-YYYY HH24:MI:SS') and
              to_date(:b1,'MM-DD-YYYY HH24:MI:SS')+1-1/(24*60*60
              )

Plan hash value: 2623397998
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |       |       |     5 (100)|          |
|*  1 |  FILTER                      |                |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| VIVEK_TEST     |    15 |   525 |     5   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | VIVEK_TEST_IDX |    15 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

With the New Index, the performance is consistent and the performance matrix (I/O’s and throwaway percentage) are stable too. This answers the concern raised by the Customers “What is the cause of Sudden Degration ?”. In most of the cases, the change in behaviour is due to Application. You may notice that the change in number of rows from 3000 to 3250 (a difference of 250 rows) could cause the performance degradation. In this case, the number of blocks remained unchanged. As mentioned earlier, in my test case, it was the 61st day that triggered the plan change, which means, addition of just 50 rows. On an actual OLTP production system, the number of rows inserted on a day into a table is huge.

You may run the following pl/sql block on your test database to check the day on which the execution plan of above query would change.

drop index vivek_test_idx;
create index vivek_test_idx on vivek_test(custid);
This is for 120 Days and Statement Id from plan_table corresponds to the day.

declare
s_date	varchar2(30):=to_char(trunc(sysdate)-1,'MM-DD-YYYY');
e_date  varchar2(30):=to_char(trunc(sysdate)+1-1/(24*60*60),'MM-DD-YYYY');
l_stat	varchar2(1000);
begin
  execute immediate 'delete from plan_table';
  for i in 1..120
  loop
   execute immediate 'truncate table vivek_test';
   insert into vivek_test 
   with main as
   (select mod(rownum,10) custid,
           round(dbms_random.value(1000,7000),0) sale_value,
           dbms_random.string('A',20) product_name
   from    all_objects a
   where rownum<=50)
   select custid, s_date sale_date, sale_value, product_name
   from    main, (select sysdate - level s_date from dual connect by leveltrue,method_opt=>'for all columns size 1');
   l_stat:= 'explain plan set statement_id '||''''||i||''''||' for select custid, sale_value, sale_date from vivek_test where custid=1
            and   sale_date between to_date(:b2,'||''''||'MM-DD-YYYY HH24:MI:SS'||''''||') 
            and to_date(:b3,'||''''||'MM-DD-YYYY HH24:MI:SS'||''''||')';
    execute immediate l_stat using s_date, e_date;
--    dbms_output.put_line(l_stat);
   commit;
  end loop;
end;
/
column operation for a20
column options for a20
select statement_id, operation, options from plan_table
where operation='TABLE ACCESS'
order by to_number(statement_id);

STATEMENT_ID                   OPERATION            OPTIONS
------------------------------ -------------------- --------------------
1                              TABLE ACCESS         BY INDEX ROWID
2                              TABLE ACCESS         BY INDEX ROWID
3                              TABLE ACCESS         BY INDEX ROWID

56                             TABLE ACCESS         BY INDEX ROWID
57                             TABLE ACCESS         BY INDEX ROWID
58                             TABLE ACCESS         BY INDEX ROWID
59                             TABLE ACCESS         BY INDEX ROWID
60                             TABLE ACCESS         BY INDEX ROWID
61                             TABLE ACCESS         FULL <---- From 61st Day, the Plan Changed
62                             TABLE ACCESS         FULL
63                             TABLE ACCESS         FULL
64                             TABLE ACCESS         FULL

120                            TABLE ACCESS         FULL

120 rows selected.

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

8 Responses to What has triggered this Sudden Performance Degradation ? A Generic Question…

  1. Nice post Vivek.

    Cheers !

    Like

  2. allthingsora says:

    Hello Sir
    The real issue with the test case above was improper indexing strategy. Most developers at the time of writing queries would first check the logical correctness of the query and then create a certain index (guesswork) see that query is performing fine, probably check index usage in sql plan and OK the query for production.
    Instead if the developers run the query through dbms_sqltune or sql tuning advisor and then create index as per the recommendation given by the advisor, would it still lead to improper indexing strategy?

    Like

  3. Tarun Sharma says:

    A good platform for developers to understand the importance of GOOD indexing strategy.

    Tarun

    Like

  4. Mubeen says:

    good article with nice explanation.

    Like

  5. Jatin says:

    Hi vivek

    Same thing happened in our environment. Suddenly . performance degradation happened after gathering the stats after long time. We have more than 100 of queries that needs to be tuned. Then we decided to upgrade the database to 11gR2. After upgradation 80% problem got resolved. Is this due to 11g optimizer change?

    Like

    • Vivek says:

      Hi Jatin,

      Yes, this could be because of the 11g Optimizer, which has improved further. There are many enhancements in 11g Optimizer, which I plan to write in my blogs, which gives a better performance. Few of my customers have also reported this to me. I recollect uggesting one of my customer to Upgrade to 11g for Improved performance and demonstrated them some of the new features that would be beneficial. Post Upgrade, the customer reported an improvement without any change.

      Regards
      Vivek

      Like

  6. Ram says:

    same issue occurred in one of my production database and how we can we justify the customer about this.

    Like

  7. Joseph Dias says:

    Nice Article Vivek.

    Like

Leave a comment