Yet another Query Plan Change ! Cost Based Optimization…

Optimizer relies on Statistics to compute an Optimal Plan, which includes : Choice between Index Scan or Table Scan, in case of Index Scan, which index to choose, and more importantly, the Join Order and Join Type. Therefore, it is quite evident that accurate statistics means a good and accurate estimation of Cardinality, which in turn, results in optimal execution plan. But there are cases, when accurate statistics is not good enough for an Optimizer to come out with an Optimal plan. In this blog, I will post a real life issue which is an interesting case, in which, the Optimizer generated plan, impacted the performance of a query whereas the underlying object statistics were accurate. Accurate statistics means the the statistics that reflect the accurate picture of the data in the underlying table.

In many of the events that I presented, one of the common statement that I emphasize is “Differentiate Performance Symptom and Performance Problem”. In this case too, the inhouse team was investigating and focussing more on the Symptom while the problem was somewhere else. We will walk through this as well.

Recently, at a customer site, the performance of one of a critical performance, that runs every few minutes, got impacted. The inhouse adminitrator time could investigate that execution time of one particular query, that usually takes less than a second, changed to 30-40 minutes. Fortunately, the customer maintains Query Baselines and this helped them further delve into the issue and they observed that the plan was changed. While they were investigating the issue and the reason for this plan change, the issue was temporarily fixed using SQL Plan Management. The customer wanted to know the reason for this behaviour, so as to avoid, this in future.

This was a custom process. The original query and the good / bad plan is pasted below.

SELECT WIP_ENTITY_NAME FROM WIP_DJ_CLOSE_TEMP 
WHERE	WIP_ENTITY_ID IN
	(SELECT wdct.WIP_ENTITY_ID            
	FROM	WIP_TRANSACTIONS wt,
		WIP_DJ_CLOSE_TEMP wdct             
	WHERE	wdct.GROUP_ID = :b1             
	AND	wdct.ORGANIZATION_ID = :b2
	AND	wdct.WIP_ENTITY_ID = wt.WIP_ENTITY_ID             
	AND	wt.ORGANIZATION_ID = :b3             
	AND	wt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE   
	UNION    
	SELECT wdct.WIP_ENTITY_ID            
	FROM	MTL_MATERIAL_TRANSACTIONS mmt,
		WIP_DJ_CLOSE_TEMP wdct             
	WHERE	wdct.GROUP_ID = :b1             
	AND	wdct.ORGANIZATION_ID = :b2
	AND	wdct.WIP_ENTITY_ID = mmt.TRANSACTION_SOURCE_ID            
	AND	mmt.TRANSACTION_SOURCE_TYPE_ID = 5             
	AND	mmt.ORGANIZATION_ID= :b3
	AND mmt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE)

Good Plan

Plan hash value: 2500423228

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                              |       |       |  6017 (100)|          |       |       |
|   1 |  NESTED LOOPS                           |                              |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                          |                              |    70 |  1960 |  6017   (1)| 00:00:36 |       |       |
|   3 |    VIEW                                 | VW_NSO_1                     |    70 |   910 |  5947   (1)| 00:00:35 |       |       |
|   4 |     SORT UNIQUE                         |                              |    70 |  3040 |  5947  (92)| 00:00:35 |       |       |
|   5 |      UNION-ALL                          |                              |       |       |            |          |       |       |
|   6 |       NESTED LOOPS                      |                              |       |       |            |          |       |       |
|   7 |        NESTED LOOPS                     |                              |    45 |  1890 |   490   (1)| 00:00:03 |       |       |
|*  8 |         TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |  1920 |     9   (0)| 00:00:01 |       |       |
|*  9 |          INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   797 |       |     2   (0)| 00:00:01 |       |       |
|* 10 |         INDEX RANGE SCAN                | WIP_TRANSACTIONS_N1          |    24 |       |     0   (0)|          |       |       |
|* 11 |        TABLE ACCESS BY INDEX ROWID      | WIP_TRANSACTIONS             |     1 |    18 |     6   (0)| 00:00:01 |       |       |
|  12 |       NESTED LOOPS                      |                              |       |       |            |          |       |       |
|  13 |        NESTED LOOPS                     |                              |    25 |  1150 |  5455   (1)| 00:00:32 |       |       |
|* 14 |         TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |  1920 |     9   (0)| 00:00:01 |       |       |
|* 15 |          INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   797 |       |     2   (0)| 00:00:01 |       |       |
|  16 |         PARTITION RANGE ITERATOR        |                              |     2 |       |    67   (0)| 00:00:01 |   KEY |    36 |
|* 17 |          INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |       |    67   (0)| 00:00:01 |   KEY |    36 |
|* 18 |        TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |    22 |    68   (0)| 00:00:01 |     1 |     1 |
|* 19 |    INDEX UNIQUE SCAN                    | WIP_DJ_CLOSE_TEMP_U1         |     1 |       |     0   (0)|          |       |       |
|  20 |   TABLE ACCESS BY INDEX ROWID           | WIP_DJ_CLOSE_TEMP            |     1 |    15 |     1   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------------

Bad Plan

Plan hash value: 821054275

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                              |       |       |  7295 (100)|          |       |       |
|   1 |  NESTED LOOPS                           |                              |    19 |   532 |  7295   (1)| 00:00:43 |       |       |
|   2 |   VIEW                                  | VW_NSO_1                     |    19 |   247 |  7276   (1)| 00:00:43 |       |       |
|   3 |    SORT UNIQUE                          |                              |    19 |   826 |  7276  (88)| 00:00:43 |       |       |
|   4 |     UNION-ALL                           |                              |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY INDEX ROWID        | WIP_TRANSACTIONS             |     1 |    18 |     6   (0)| 00:00:01 |       |       |
|   6 |       NESTED LOOPS                      |                              |    12 |   504 |   930   (1)| 00:00:06 |       |       |
|*  7 |        TABLE ACCESS BY INDEX ROWID      | WIP_DJ_CLOSE_TEMP            |   152 |  3648 |    16   (0)| 00:00:01 |       |       |
|*  8 |         INDEX RANGE SCAN                | WIP_DJ_CLOSE_TEMP_N1         |  1522 |       |     4   (0)| 00:00:01 |       |       |
|*  9 |        INDEX RANGE SCAN                 | WIP_TRANSACTIONS_N1          |    24 |       |     0   (0)|          |       |       |
|* 10 |      HASH JOIN                          |                              |     7 |   322 |  6344   (1)| 00:00:37 |       |       |
|* 11 |       TABLE ACCESS BY INDEX ROWID       | WIP_DJ_CLOSE_TEMP            |   152 |  3648 |    16   (0)| 00:00:01 |       |       |
|* 12 |        INDEX RANGE SCAN                 | WIP_DJ_CLOSE_TEMP_N1         |  1522 |       |     4   (0)| 00:00:01 |       |       |
|  13 |       PARTITION RANGE ALL               |                              | 25149 |   540K|  6327   (1)| 00:00:37 |     1 |    36 |
|  14 |        TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    | 25149 |   540K|  6327   (1)| 00:00:37 |     1 |    36 |
|* 15 |         INDEX RANGE SCAN                | MTL_MATERIAL_TRANSACTIONS_N8 | 25149 |       |   207   (1)| 00:00:02 |     1 |    36 |
|  16 |   TABLE ACCESS BY INDEX ROWID           | WIP_DJ_CLOSE_TEMP            |     1 |    15 |     1   (0)| 00:00:01 |       |       |
|* 17 |    INDEX UNIQUE SCAN                    | WIP_DJ_CLOSE_TEMP_U1         |     1 |       |     0   (0)|          |       |       |
----------------------------------------------------------------------------------------------------------------------------------------

Before diving into the analysis, let us compare both the plans and this will be from the customer’s perspective. They compared the plans and noticed that the Tables and Indexes selected by the optimizer, in both the cases, are same except for MTL_MATERIAL_TRANSACTION Table. The good plan uses N2 Index while the bad plan uses N8. Therefore, the customer hinted the query with N2 index and fixed the plan using SPM. Once fixed, they started their investigation on the reason for this change from N2 and N8. As mentioned earlier, this was just a Symptom while the underlying problem was something else. This something triggered the change in Index Selection and we will investigate this next.

Analysis

Comparing both the plans, it was now evident that the problem is with the second part of the Query, therefore, I took this part out for investigation. This is now a simple and small query, as show below :

SELECT wdct.WIP_ENTITY_ID            
FROM	apps.MTL_MATERIAL_TRANSACTIONS mmt,
	wip.WIP_DJ_CLOSE_TEMP wdct             
WHERE	wdct.GROUP_ID = :b1             
AND	wdct.ORGANIZATION_ID = :b2
AND	wdct.WIP_ENTITY_ID = mmt.TRANSACTION_SOURCE_ID            
AND	mmt.TRANSACTION_SOURCE_TYPE_ID = 5             
AND	mmt.ORGANIZATION_ID= :b3             
AND	mmt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE

The another change that I noticed here was the change of Nested Loop Join to Hash Join. The customer had a clone environment of this database which was 2 months old and I executed this query on the clone enviroment. The Query executed in less than a second on this and therefore, the investigation wa further made simpler for me. The Query plan on the clone and production is pasted below :

Production Plan : BAD
Plan hash value: 3088015387

------------------------------------------------------------------------------------
| Id  | Operation                           | Name                         | Rows  |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                              |       |
|*  1 |  HASH JOIN                          |                              |    49 |
|*  2 |   TABLE ACCESS BY INDEX ROWID       | WIP_DJ_CLOSE_TEMP            |   155 |
|*  3 |    INDEX RANGE SCAN                 | WIP_DJ_CLOSE_TEMP_N1         |  1552 |
|   4 |   PARTITION RANGE ALL               |                              | 24662 |
|   5 |    TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    | 24662 |
|*  6 |     INDEX RANGE SCAN                | MTL_MATERIAL_TRANSACTIONS_N8 | 24662 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID")
       filter("MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   2 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   3 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5 AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID)

Clone DB Plan : GOOD
Plan hash value: 569619275

-----------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |       |
|   1 |  NESTED LOOPS                      |                              |       |
|   2 |   NESTED LOOPS                     |                              |    26 |
|*  3 |    TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |
|*  4 |     INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   798 |
|   5 |    PARTITION RANGE ITERATOR        |                              |     2 |
|*  6 |     INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |
|*  7 |   TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   4 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID" AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID AND
              "MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   7 - filter("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5)

Notice the estimated rows calculated by the optimizer for the driving Index and Table. In clone db, the optimizer says 798 rows from an Index TEMP_N1 and then 80 Rows from the Table Access component. On production, the optimizer computes 1552 rows from the driving Index TEMP_N1 and then 155 rows from the Table Access. The optimizer further computes that 155 Iteration of MTL_MATERIAL_TRANSACTION (Nested Loop) will be a costly affair and therefore opts for a Hash Join. Since the clone db was 2 months old, the statistics were also old. Once I was convinced that estimated cardinality for the driving table seems to be an issue, I imported the production stats to clone and simulated the issue. I imported the stats of only the WIP_DJ_CLOSE_TEMP table on clone and could simulate the performance impact. The Table and relevant Column Level statistics of WIP_DJ_CLOSE_TEMP is pasted below. I have removed unwanted column stats from this post. This is from the clone and production.

Clone Stats - Performance Optimal

SQL> @table_stats
Enter value for 1: WIP_DJ_CLOSE_TEMP
old   2: where table_name='&1'
new   2: where table_name='WIP_DJ_CLOSE_TEMP'

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO T
------------------------------ --- ---------- ---------- --------- --- -
WIP                            NO      118836        829 26-SEP-10 YES N

Elapsed: 00:00:00.23
SQL> @column_stats
Enter value for 1: WIP
old   3: where owner='&1'
new   3: where owner='WIP'
Enter value for 2: WIP_DJ_CLOSE_TEMP
old   4: and   table_name='&2'
new   4: and   table_name='WIP_DJ_CLOSE_TEMP'

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                149          0 .006711409 NONE
ORGANIZATION_ID                          10          0         .1 NONE

Production Stats - Performance sub-optimal

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
WIP                            NO      116383        829 28-NOV-10 YES

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                 75          0 .013333333 NONE
ORGANIZATION_ID                          10          0         .1 NONE

Index Details of MTL_MATERIAL_TRANSACTIONS

INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------------------------ ---------------
MTL_MATERIAL_TRANSACTIONS_N2   TRANSACTION_SOURCE_ID                        1
                               ORGANIZATION_ID                              2
                               TRANSACTION_DATE                             3

MTL_MATERIAL_TRANSACTIONS_N8   TRANSACTION_SOURCE_TYPE_ID                   1
                               ORGANIZATION_ID                              2
                               TRANSACTION_DATE                             3
                               TRANSACTION_SOURCE_NAME                      4

Issue

As mentioned at the start of this post, while the statistics are relevant for selection between an Index Scan and Table Scan and in case of Index Scan, which Index to select; the most important factor is the Computed Cardinality of the driving table as this has an impact on the Join Order and Join Type. Further, the Join Type dictates the selection of an Index for the other tables. The difference between the Nested Loop Join and Hash Join, in terms of Index Selection is : Nested Loop Join will select rows from the driving table and then use an Index on the Join Column, if it exists (else it will be an hash join), of the second table. Therefore, in case of a good plan, with the Nested Loop Join in place, the optimizer opted for N2 Index of MTL_MATERIAL_TRANSACTION table which is an Index on Join Column. In case of Hash Join, the Optimizer will try to filter out as many rows as possible for both the tables using an Index on filter predicates and once the rows are filtered, will join these using hash function on the Join Columns. Therefore, in case of the Hash Join, the index N8 of MTL_MATERIAL_TRANSACTION is on filter column and then the rows filtered were joined using hash function on the Join Column. See the PREDICATE INFORMATION for both the Good and Bad Plans above and the Index Column details for MTL_MATERIAL_TRANSACTION pasted above.

The Calculation : Clone Old Stats

Let us get into the computed cardinality calculation for the driving table, as this had an impact on the Join Type. On clone the number of Rows are : 118836 and the WHERE predicate for this table is as under :

WHERE	wdct.GROUP_ID = :b1             
AND	wdct.ORGANIZATION_ID = :b2

The Table has an Index on GROUP_ID column and the calculation says 118836*1/149 = 797.55, rounded off to 798. Further, it has to filter based on non-indexed column, which is Organization_ID and the calculation says 798*1/10=79.8 rounded off to 80. The clone db (good) plan shows the same.

The Calculation : Production Stats

On Production, the computed cardinality calculation for the index scan step is 116383*1/75 = 1551.77, rounded off to 1552. Further, the filter based on non-indexed column is 1552*1/10 = 155.2, rounded off to 155. The production plan shows the same.

In the case of production statistics, the statistics were collected on Sunday and this triggered the change in plan. Further, these stats were accurate, even then, the optimizer came out with a plan that was sub-optimal. As mentioned, we could simulate the issue on Clone DB after importing the statistics of the driving table – WIP_DJ_CLOSE_TEMP. As a resolution and to check the impact, we modified the column level statistics of GROUP_ID column on clone db to 200 and the plan changed to good.

As the name of the driving table implies – WIP_DJ_CLOSE_TEMP, this is an interface table and the rows are deleted and inserted into this table. The table usually contain More or Less 0.1 Million Rows. Therefore, as a final action, once the column level statistics for GROUP_ID column was fixed on production, we locked the statistics of this table. This was to ensure that the plan of this query will not be impacted after periodic statistics generation and remains stable.

After manually setting the column level statistics for GROUP_ID column to 200, the plan is as under :

Plan hash value: 569619275

-----------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |       |
|   1 |  NESTED LOOPS                      |                              |       |
|   2 |   NESTED LOOPS                     |                              |    18 |
|*  3 |    TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    58 |
|*  4 |     INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   582 |
|   5 |    PARTITION RANGE ITERATOR        |                              |     2 |
|*  6 |     INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |
|*  7 |   TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   4 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID" AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID AND
              "MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   7 - filter("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5)

Production Stats - Performance sub-optimal

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
WIP                            NO      116383        829 28-NOV-10 YES

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                200          0 .013333333 NONE
ORGANIZATION_ID                          10          0         .1 NONE

The calculation now stands as Index Step : 116383*1/200 = 581.91, rounded off to 582. Table Access : 582*1/10 = 58.2, rounded off to 58. This resolved the issue.

In this scenario, Investigating the statistics on MTL_MATERIAL_TRANSACTIONS table would have resulted futile,as this change was just a symptom due to the problem with stats on the driving table.

Advertisements

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

6 Responses to Yet another Query Plan Change ! Cost Based Optimization…

  1. Anand says:

    Hi Sir,

    The step by step diagnosis that you perform is simply awesome.Thanks for sharing. 🙂

    Regards,
    Anand

    Like

  2. Prasad says:

    Hi Vivek,

    Very nice post indeed!!

    As you mentioned that “Stats were accurate but optimizer came out with the bad plan”
    can you elaborate on it.

    If the stats are accurate then why the performance got degraded with it…..Is it because the rows got added to the table after Sunday and plan was bad with more rows?

    Is it like, actual distinct value for group_id column is more than 75,but stats showing only 75.?

    Is there a difference in the actual distinct values for group_id column in PROD and CLONE database.?

    Regards,
    Prasad

    Like

    • Vivek says:

      Prasad,

      The stats represented the accurate picture of the data, as it was, during stats collection. This table is an interface table and the data in deleted and inserted. The stats collected, though it reflected the actual data, caused the optimizer to assume Hash Join to be a faster access path. I have not posted the calculation for Nested Loop Join, which was higher than the hash join.

      Regards
      Vivek

      Like

  3. Nakul says:

    Nice one !

    Like

  4. shivakarthik says:

    Very detail explaination !! Thanks.
    I got a question, How were you able to pinpoint that the issue was with the driving table at looking at the explain plan or the JOIN Condition.

    When there are other things which were of concern on the first look, Like the acces oracle chose between PARTITION RANGE ITERATOR and PARTITION RANGE ALL.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s