Importance of Constraints ! Know your Reporting Tools

Recently, I was working on a customer issue. The issue was a performance comparison between a Competition and an Oracle Database. The performance of Competition was reported to be better than Oracle. Now, this is a classic case of Bad Schema Design and a Badly Written Query. Working on this issue reminded me of a great learning that I had after reading “Expert Oracle Database Architecture” by Thomas Kyte. He wrote about a classic issue with a Pl/SQL Block running in SQL Server and generating wrong results when ported to Oracle Database due to the way these 2 databases compare NULL values. Each of these databases are different. It also reminded me of one of my response to a query from a customer on “A count(*) from a Table is doing a Full Table Scan, even though it has a Unique Index on it”.

As Tom mentioned in his book, every database is different and implement the features differently. Now, the third party tools that connect to each of these different data sources generate queries that syntactically work on all but may not run optimally. Therefore, it is important to understand our Data and design the Schema with all the required constraints and indexes in place.

In this case, customer was running few analytical reports from a Third Party Reporting Tool (Tableau). Customer selected few columns with couple of predicates with Date Range and some product name. The Queries were around 40-50 Lines. Surprisingly, for each of the Predicates, Tableau added few additional predicates and due to these additional predicates, the run time plan was around 2700+ lines long. As an example, I have a table T1 and have to generate a report selecting object id’s and names for any of the 2 TEMPORARY values (Y or N). My query will look like :

select object_id, object_name from t1 where temporary=:b1;

If I have an Index on TEMPORARY Column, Optimizer will come out with an optimal plan based on it’s cost calculation. However, when run from Tableau, it came out with the following query:

select object_id, object_name from t1
where ((temporary=:b1) or (temporary is NULL and :b1 is NULL));

What will be the implication of this added OR conditions ? Let’s see.

SQL> create table t1 as
select * from all_objects;  2

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'T1', method_opt=>'for all columns size auto for columns temporary size 100');

PL/SQL procedure successfully completed.

SQL> create index t1_idx on t1(temporary);

Index created.

SQL> select num_rows, blocks from dba_tables where table_name='T1';

  NUM_ROWS     BLOCKS
---------- ----------
     68417       1375

SQL> variable l_temporary varchar2(1);
SQL> exec :l_temporary:='Y';

PL/SQL procedure successfully completed.

SQL> select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL)); 

161 rows selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  7xsahyu4q594y, child number 0
-------------------------------------
select object_id, object_name from t1 where ((temporary=:l_temporary)
or (temporary is NULL and :l_temporary is NULL))

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   375 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   161 |  6923 |   375   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEMPORARY"=:L_TEMPORARY OR ("TEMPORARY" IS NULL AND
              :L_TEMPORARY IS NULL)))

161 Rows out of 68000, which is less than 1%. Index Access would have been a better option. At customer end, the table was huge with Billions of Rows and with multiple such OR Predicates, the very first observation was the amount of time the Optimizer took to Parse the Query. The Query took around 184 Seconds to run and the observation was that out of 184 Seconds, around 175 Seconds were spent on Parsing. This was identified as a BUG in 18c (BUG#28725660) and the primary cause identified as the change in OR Expansion behaviour in 18c. This BUG is fixed in 19c. Backporting it to 18c would have taken some time, so the other fix that we applied was to add NOT NULL Constraints to some of the columns. From the Data, we could see that none of the columns had NULL values. We checked with the developers and they mentioned that NULL values are not stored in this column. Therefore, it was safe to add these constraints. Continuing with our example above, let’s add a NOT NULL constraint to our Table T1.

SQL> alter table t1 modify temporary not null;

Table altered.

select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL));

select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |       |     5 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |   161 |  6923 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |   161 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"=:L_TEMPORARY)

The run time plan of the same query is now an Index Scan and is much better than previous. This additional input to the Optimizer was enough to transform the query. You can generate 10053 trace to see the transformation.

NOT NULL constraint combined with Unique Index is required to answer a COUNT(*) query from an Index, which one of my customer asked long time back. This is another case, where NOT NULL constraint helped optimizer come up with an optimal plan.

It is important to optimize the Schema and provide all the critical inputs to the Optimizer. Generic reporting tools would come out with queries that work on all the databases. These generated queries may work on some database and may not work on other. Therefore, the title of this blog “Know your Reporting Tool” :).

On the real customer issue, post adding the constraints to few of the columns, the query response time reduced drastically from 184 Seconds to less than 10 seconds far better than the competition.

Query Performance Issue ! SQL*Plus v/s Oracle Reports…

Have editted on 1st March to include the Execution Plan of the Modified Query. See at the end.

Recently, got a mail from one of my customer on a Query Performance Issue. The Customer mentioned that the Query, when executed from SQL*Plus, takes around 10 Seconds, whereas, when the pasted in Oracle Reports and run through Concurrent Manager, it runs from around 30-40 minutes and is finally killed. The Customer also confirmed that the Query is exactly same i.e it is purely a Copy-Paste of the Query that was run from SQL*Plus and that the Bind Type & Length too are same. The run time plan for both the executions are totally different, with the plan of SQL*Plus better than the plan generated by Oracle Reports. The Question was, “Why is the plan different between SQL*Plus and Oracle Reports ?”

As usual, I asked for a 10053 trace for both the executions, which is one of the best way in investigating Optimizer Calculations. The 10053 trace revealed some important facts about the Query Performance, and therefore helped resolve the issue. Before we dive into the investigation, let me paste the Query and both the Execution Plans.

This is the Query.

SELECT /*+ INDEX(WDD WSH_DELIVERY_DETAILS_TI4) */
             WDD.ORGANIZATION_ID, 
             COUNT(*) LPNS,
             TRP.TRIP_ID,
             RC.CUSTOMER_NUMBER,
             RC.CUSTOMER_NAME, 
             RAA.ADDRESS1 TO_ADDRESS1, 
             RAA.CITY RAA.STATE TO_CITY_STATE, 
             HLA.DESCRIPTION FROM_DESC, 
             HLA.ADDRESS_LINE_1||' '||HLA.TOWN_OR_CITY||' '||HLA.TOWN_OR_CITY)  FROM_ADDRESS1_CITY, 
             WDA.DELIVERY_ID, 
             WDA.PARENT_DELIVERY_DETAIL_ID,
             WDD.DELIVERY_DETAIL_ID, 
             WDD1.CONTAINER_NAME    CONTAINER_NAME, 
             SUM(WDD.REQUESTED_QUANTITY) REQUESTED_QUANTITY, 
             WDD.REQUESTED_QUANTITY_UOM, 
             SUM(ROUND(WDD.GROSS_WEIGHT,0)) GROSS_WEIGHT, 
             WDD.WEIGHT_UOM_CODE,
             WT.VEHICLE_NUM_PREFIX ||' '||WT.VEHICLE_NUMBER VEHICLE_NUMBER,
             MSI.SEGMENT1 ITEM_CODE, 
	     MSI.ATTRIBUTE10 PRT_DESC 
FROM 	apps.WSH_DELIVERY_DETAILS WDD,
     	apps.WSH_DELIVERY_DETAILS WDD1,
        apps.WSH_DELIVERY_ASSIGNMENTS WDA ,
        apps.WSH_DELIVERY_ASSIGNMENTS WDA1,    
        apps.WSH_TRIPS WT, 
        apps.MTL_SYSTEM_ITEMS MSI, 
	TI_RA_ADDRESSES_V RAA,
	TI_RA_CUSTOMERS_V RC ,
        apps.HR_LOCATIONS_ALL HLA,
    	(SELECT DISTINCT  T.TRIP_ID, WDA.DELIVERY_ID 
    	FROM	apps.WSH_DELIVERY_ASSIGNMENTS WDA, 
          	apps.WSH_DELIVERY_LEGS DL,
          	apps.WSH_TRIP_STOPS PICKUP_STOP,  
             	apps.WSH_TRIPS T
	WHERE	WDA.DELIVERY_ID     = DL.DELIVERY_ID 
        AND     DL.PICK_UP_STOP_ID  = PICKUP_STOP.STOP_ID 
        AND     PICKUP_STOP.TRIP_ID = T.TRIP_ID
        AND     T.TRIP_ID  >=  : P_TRIPID_FR
        AND     T.TRIP_ID  = : P_TRIPID_FR 
AND 	TRP.TRIP_ID <= : P_TRIPID_TO 
GROUP BY WDD.ORGANIZATION_ID , TRP.TRIP_ID , RC.CUSTOMER_NUMBER , RC.CUSTOMER_NAME , RAA.ADDRESS1 , 
RAA.CITY , RAA.STATE , HLA.DESCRIPTION , HLA.ADDRESS_LINE_1 , HLA.TOWN_OR_CITY , WDA.DELIVERY_ID , 
WT.VEHICLE_NUM_PREFIX , WT.VEHICLE_NUMBER , WDA.PARENT_DELIVERY_DETAIL_ID , WDD.DELIVERY_DETAIL_ID , 
WDD1.CONTAINER_NAME , WDD.REQUESTED_QUANTITY_UOM , WDD.WEIGHT_UOM_CODE , MSI.SEGMENT1 , MSI.ATTRIBUTE10

The Runtime Plan of the Query when Executed from SQL*Plus.

-----------------------------------------------------------------------------------------------
| Id  | Operation                                       | Name                        | Rows  |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                |                             |       |
|   1 |  SORT GROUP BY                                  |                             |     1 |
|   2 |   VIEW                                          | VM_NWVW_1                   |     1 |
|   3 |    SORT UNIQUE                                  |                             |     1 |
|*  4 |     FILTER                                      |                             |       |
|   5 |      NESTED LOOPS                               |                             |       |
|   6 |       NESTED LOOPS                              |                             |     1 |
|   7 |        NESTED LOOPS                             |                             |     1 |
|   8 |         NESTED LOOPS                            |                             |     1 |
|   9 |          NESTED LOOPS                           |                             |     5 |
|  10 |           NESTED LOOPS                          |                             |     5 |
|  11 |            NESTED LOOPS                         |                             |     5 |
|  12 |             NESTED LOOPS                        |                             |     5 |
|  13 |              NESTED LOOPS OUTER                 |                             |    27 |
|  14 |               NESTED LOOPS OUTER                |                             |    27 |
|  15 |                NESTED LOOPS                     |                             |    27 |
|  16 |                 NESTED LOOPS                    |                             |  1908 |
|  17 |                  NESTED LOOPS                   |                             |   427 |
|  18 |                   NESTED LOOPS                  |                             |   127 |
|  19 |                    MERGE JOIN                   |                             |    72 |
|  20 |                     TABLE ACCESS BY INDEX ROWID | WSH_TRIP_STOPS              |   187 |
|* 21 |                      INDEX RANGE SCAN           | WSH_TRIP_STOPS_N1           |   187 |
|* 22 |                     SORT JOIN                   |                             |    72 |
|  23 |                      TABLE ACCESS BY INDEX ROWID| WSH_TRIPS                   |    72 |
|* 24 |                       INDEX RANGE SCAN          | WSH_TRIPS_U1                |    72 |
|  25 |                    TABLE ACCESS BY INDEX ROWID  | WSH_DELIVERY_LEGS           |     2 |
|* 26 |                     INDEX RANGE SCAN            | WSH_DELIVERY_LEGS_N2        |     2 |
|* 27 |                   INDEX RANGE SCAN              | WSH_DELIVERY_ASSIGNMENTS_N1 |     3 |
|  28 |                  TABLE ACCESS BY INDEX ROWID    | WSH_DELIVERY_ASSIGNMENTS    |     4 |
|* 29 |                   INDEX RANGE SCAN              | WSH_DELIVERY_ASSIGNMENTS_N1 |     4 |
|* 30 |                 TABLE ACCESS BY INDEX ROWID     | WSH_DELIVERY_DETAILS        |     1 |
|* 31 |                  INDEX RANGE SCAN               | WSH_DELIVERY_DETAILS_TI4    |     1 |
|* 32 |                INDEX RANGE SCAN                 | WSH_DELIVERY_ASSIGNMENTS_N3 |     1 |
|  33 |               TABLE ACCESS BY INDEX ROWID       | WSH_DELIVERY_DETAILS        |     1 |
|* 34 |                INDEX UNIQUE SCAN                | WSH_DELIVERY_DETAILS_U1     |     1 |
|  35 |              TABLE ACCESS BY INDEX ROWID        | MTL_SYSTEM_ITEMS_B          |     1 |
|* 36 |               INDEX UNIQUE SCAN                 | MTL_SYSTEM_ITEMS_B_U1       |     1 |
|  37 |             TABLE ACCESS BY INDEX ROWID         | HZ_CUST_ACCOUNTS            |     1 |
|* 38 |              INDEX UNIQUE SCAN                  | HZ_CUST_ACCOUNTS_U1         |     1 |
|  39 |            TABLE ACCESS BY INDEX ROWID          | HZ_LOCATIONS                |     1 |
|* 40 |             INDEX UNIQUE SCAN                   | HZ_LOCATIONS_U1             |     1 |
|  41 |           TABLE ACCESS BY INDEX ROWID           | HZ_PARTY_SITES              |     1 |
|* 42 |            INDEX RANGE SCAN                     | HZ_PARTY_SITES_N2           |     1 |
|* 43 |          TABLE ACCESS BY INDEX ROWID            | HZ_CUST_ACCT_SITES_ALL      |     1 |
|* 44 |           INDEX RANGE SCAN                      | HZ_CUST_ACCT_SITES_N1       |     1 |
|  45 |         TABLE ACCESS BY INDEX ROWID             | HZ_PARTIES                  |     1 |
|* 46 |          INDEX UNIQUE SCAN                      | HZ_PARTIES_U1               |     1 |
|* 47 |        INDEX UNIQUE SCAN                        | HR_LOCATIONS_PK             |     1 |
|  48 |       TABLE ACCESS BY INDEX ROWID               | HR_LOCATIONS_ALL            |     1 |
-----------------------------------------------------------------------------------------------

Runtime Plan of the Query when Executed from Oracle Reports.

  
  -----------------------------------------------------------------------------------------------
  | Id  | Operation                                       | Name                        | Rows  |
  -----------------------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT                                |                             |       |
  |   1 |  SORT ORDER BY                                  |                             |     1 |
  |   2 |   SORT GROUP BY                                 |                             |     1 |
  |   3 |    VIEW                                         | VM_NWVW_1                   |     1 |
  |   4 |     SORT UNIQUE                                 |                             |     1 |
  |*  5 |      FILTER                                     |                             |       |
  |   6 |       NESTED LOOPS                              |                             |       |
  |   7 |        NESTED LOOPS                             |                             |     1 |
  |   8 |         NESTED LOOPS                            |                             |     1 |
  |   9 |          NESTED LOOPS                           |                             |     4 |
  |  10 |           NESTED LOOPS                          |                             |     4 |
  |  11 |            NESTED LOOPS                         |                             |     1 |
  |  12 |             NESTED LOOPS OUTER                  |                             |     1 |
  |  13 |              NESTED LOOPS OUTER                 |                             |     1 |
  |  14 |               NESTED LOOPS                      |                             |     1 |
  |* 15 |                HASH JOIN                        |                             |     1 |
  |  16 |                 TABLE ACCESS FULL               | HZ_CUST_ACCT_SITES_ALL      | 36983 |
  |* 17 |                 HASH JOIN                       |                             | 23952 |
  |  18 |                  TABLE ACCESS FULL              | HZ_PARTY_SITES              | 76710 |
  |* 19 |                  HASH JOIN                      |                             | 22674 |
  |  20 |                   TABLE ACCESS FULL             | HZ_LOCATIONS                | 72598 |
  |* 21 |                   HASH JOIN                     |                             | 22737 |
  |  22 |                    TABLE ACCESS FULL            | HZ_CUST_ACCOUNTS            | 14291 |
  |* 23 |                    HASH JOIN                    |                             | 22801 |
  |  24 |                     TABLE ACCESS BY INDEX ROWID | MTL_SYSTEM_ITEMS_B          | 11073 |
  |* 25 |                      INDEX RANGE SCAN           | MTL_SYSTEM_ITEMS_B_N16      | 11073 |
  |* 26 |                     HASH JOIN                   |                             |   130K|
  |  27 |                      TABLE ACCESS FULL          | HR_LOCATIONS_ALL            |   919 |
  |* 28 |                      TABLE ACCESS BY INDEX ROWID| WSH_DELIVERY_DETAILS        |   130K|
  |* 29 |                       INDEX RANGE SCAN          | WSH_DELIVERY_DETAILS_TI4    |   143K|
  |* 30 |                TABLE ACCESS BY INDEX ROWID      | WSH_DELIVERY_ASSIGNMENTS    |     1 |
  |* 31 |                 INDEX RANGE SCAN                | WSH_DELIVERY_ASSIGNMENTS_N3 |     1 |
  |* 32 |               INDEX RANGE SCAN                  | WSH_DELIVERY_ASSIGNMENTS_N3 |     1 |
  |  33 |              TABLE ACCESS BY INDEX ROWID        | WSH_DELIVERY_DETAILS        |     1 |
  |* 34 |               INDEX UNIQUE SCAN                 | WSH_DELIVERY_DETAILS_U1     |     1 |
  |  35 |             TABLE ACCESS BY INDEX ROWID         | HZ_PARTIES                  |     1 |
  |* 36 |              INDEX UNIQUE SCAN                  | HZ_PARTIES_U1               |     1 |
  |* 37 |            INDEX RANGE SCAN                     | WSH_DELIVERY_ASSIGNMENTS_N1 |     4 |
  |  38 |           TABLE ACCESS BY INDEX ROWID           | WSH_DELIVERY_LEGS           |     1 |
  |* 39 |            INDEX RANGE SCAN                     | WSH_DELIVERY_LEGS_N1        |     1 |
  |* 40 |          TABLE ACCESS BY INDEX ROWID            | WSH_TRIP_STOPS              |     1 |
  |* 41 |           INDEX UNIQUE SCAN                     | WSH_TRIP_STOPS_U1           |     1 |
  |* 42 |         INDEX UNIQUE SCAN                       | WSH_TRIPS_U1                |     1 |
  |  43 |        TABLE ACCESS BY INDEX ROWID              | WSH_TRIPS                   |     1 |
  -----------------------------------------------------------------------------------------------

Since the investigation is around the Cardinality of each of the row source, I have removed other information (irrelevant for this discussion) from the execution plan. The Query contains a Subquery with alias as TRP. The Tables queried in the Subquery Drives the Good Plan, whereas, the Subquery is unnested and joined at a later stage in the Problematic Plan (Step 37-43). In the Problematic Plan, the Driving table is HR_LOCATIONS_ALL, which is hash joined to WSH_DELIVERY_DETAILS. The Investigation was around the change in the Execution Path. One of the primary goal of Optimizer is to calculate the Number of rows to be returned by each of the Filter Predicate and then evaluate the Access Patch, including the Driving Table, based on the predicate that fetches the least number of rows. It then also evaluates the Join Method, which is to consider Nested Loop Join for small set of rows or Hash Join / Sort Merge Join for larger set of rows.

The text of the TRP Subquery is pasted below

    	(SELECT DISTINCT  T.TRIP_ID, WDA.DELIVERY_ID 
    	FROM	apps.WSH_DELIVERY_ASSIGNMENTS WDA, 
          	apps.WSH_DELIVERY_LEGS DL,
          	apps.WSH_TRIP_STOPS PICKUP_STOP,  
             	apps.WSH_TRIPS T
	WHERE	WDA.DELIVERY_ID     = DL.DELIVERY_ID 
        AND     DL.PICK_UP_STOP_ID  = PICKUP_STOP.STOP_ID 
        AND     PICKUP_STOP.TRIP_ID = T.TRIP_ID
        AND     T.TRIP_ID  >=  : P_TRIPID_FR
        AND     T.TRIP_ID  <=  : P_TRIPID_TO )  TRP

The Good Plan for this subquery is a Sort Merge Join on WSH_TRIPS & WSH_TRIP_STOPS. Both the tables are accessed via an Index on TRIP_ID. Note that though the range predicate is on TRIP_ID of WSH_TRIPS, the column is joined to TRIP_ID of WSH_TRIP_STOPS and therefore, the optimizer scans an Index on these columns to get the relevant rows from the table. The Optimizer calculation says 72 rows from WSH_TRIPS and 187 rows from WSH_TRIP_STOPS, which are merged join to get final 72 rows from the two tables. This calculation is based on the Input values that Optimized peeked into during the hard parse. For the same set of input values, we have a 10053 trace of the Bad Plan and therefore, I immediately concentrated on the cardinality calculation for the two tables mentioned here. Surprisingly, the first caveat that I observed in the 10053 trace was missing Bind Peeked Data and this was enough to get into the Optimizer calculations.

The Peeked Bind Values section of the 10053 trace shows following information.

*******************************************
Peeked values of the binds in SQL statement
*******************************************
----- Bind Info (kkscoacd) -----
 Bind#0
  oacdty=02 mxl=22(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000000 frm=00 csi=00 siz=72 off=0
  No bind buffers allocated
 Bind#1
  oacdty=02 mxl=22(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000000 frm=00 csi=00 siz=0 off=24
  No bind buffers allocated
 Bind#2
  oacdty=02 mxl=22(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=00 fl2=1000000 frm=00 csi=00 siz=0 off=48
  No bind buffers allocated
 Bind#3
  No oacdef for this bind.
 Bind#4
  No oacdef for this bind.

When the Query was executed from SQL*Plus, the values for the Binds were populated and were visible in the Peeked Bind values section of 10053 trace. The missing information means, the Optimizer would fall back to the default calculation for a range predicate. From the trace, the computed cardinality for the two tables, WSH_TRIPS & WSH_TRIP_STOPS, are as under :

Access path analysis for WSH_TRIP_STOPS
***************************************
Access path analysis for WSH_TRIP_STOPS
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for WSH_TRIP_STOPS[PICKUP_STOP] 

  Table: WSH_TRIP_STOPS  Alias: PICKUP_STOP
    Card: Original: 4666397.000000  Rounded: 11666  Computed: 11665.99  Non Adjusted: 11665.99

Access path analysis for WSH_TRIPS
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for WSH_TRIPS[T] 

  Table: WSH_TRIPS  Alias: T
    Card: Original: 1802653.000000  Rounded: 4507  Computed: 4506.63  Non Adjusted: 4506.63

In the absence of the actual values provided to the Bind Variables, the Optimizer computed the Cardinality based on defaults, which in this case is 0.25% (5% for >= and 5% of = : P_TRIPID_FR and TRIP_ID <= : P_TRIPID_TO is too high. Another predicate in the Outer query is ORGANIZATION_ID and based on this predicate, optimizer generates a plan with the smallest rowsource first. The smallest rowsource is HR_LOCATIONS_ALL with 919 rows and is joined to WSH_DELIVERY_DETAILS table, which is accessed via hinted index WSH_DELIVERY_DETAILS_TI4.

To check and confirm this behaviour, I executed an EXPLAIN PLAN FOR for this query, without setting the values for the bind variables, from SQL*Plus. With no values to the Bind, the EXPLAIN PLAN FOR computed the BAD plan from SQL*Plus. Therefore, now I was sure that the problem occurs when the peeked values to the bind are not available. The next concern was “Why Oracle Report does not peek into the Bind Variables before actually generating the plan ?”.

I assume this to be a default behaviour, where the query is parsed first and then binded. Unfortunately, there is not much written on this. A better option is to write a query in an efficient way or hint the query. To resolve the issue, I modified the query and this modified version worked perfectly well on both the environments i.e. SQL*Plus and Oracle Reports. The modification was in the Subquery TRP. The change is pasted below.

    (SELECT TRIP_ID, DELIVERY_ID FROM 
    (SELECT T.TRIP_ID, WDA.DELIVERY_ID, row_number() over(partition by t.trip_id, wda.delivery_id) rnum 
                FROM   apps.WSH_DELIVERY_ASSIGNMENTS WDA, 
                                apps.WSH_DELIVERY_LEGS DL,
                                apps.WSH_TRIP_STOPS PICKUP_STOP,  
                                apps.WSH_TRIPS T
                WHERE   WDA.DELIVERY_ID     = DL.DELIVERY_ID 
        AND     DL.PICK_UP_STOP_ID  = PICKUP_STOP.STOP_ID 
        AND     PICKUP_STOP.TRIP_ID = T.TRIP_ID
        AND     T.TRIP_ID  >=  : P_TRIPID_FR
        AND     T.TRIP_ID  <=  : P_TRIPID_TO)
     WHERE RNUM=1)  TRP

The modification made sure that the TRP subquery becomes the Driving Query. The relevant portion of the execution plan post modification is attached below. The calculation for WSH_TRIPS is still at 0.25% and is therefore computed as 4507, even then, the Optimizer takes this as a Driving Table, which is what we wanted. This again leads to a common argument, which is, Optimizer is a piece of Code that works on Statistics. It is the way we write a query that dominates the Optimizer.


|* 19 |                  WINDOW SORT PUSHED RANK         |                             | 26689 |
|* 20 |                   FILTER                         |                             |       |
|  21 |                    NESTED LOOPS                  |                             | 26689 |
|  22 |                     NESTED LOOPS                 |                             |  7968 |
|  23 |                      NESTED LOOPS                |                             |  4516 |
|* 24 |                       INDEX RANGE SCAN           | WSH_TRIPS_U1                |  4507 |
|  25 |                       TABLE ACCESS BY INDEX ROWID| WSH_TRIP_STOPS              |     1 |
|* 26 |                        INDEX RANGE SCAN          | WSH_TRIP_STOPS_N1           |     1 |
|  27 |                      TABLE ACCESS BY INDEX ROWID | WSH_DELIVERY_LEGS           |     2 |
|* 28 |                       INDEX RANGE SCAN           | WSH_DELIVERY_LEGS_N2        |     2 |
|* 29 |                     INDEX RANGE SCAN             | WSH_DELIVERY_ASSIGNMENTS_N1 |     3 |

  20 - filter(:P_TRIPID_FR=:P_TRIPID_FR AND "T"."TRIP_ID"=:P_TRIPID_FR AND "PICKUP_STOP"."TRIP_ID"<=:P_TRIPID_TO))
  28 - access("DL"."PICK_UP_STOP_ID"="PICKUP_STOP"."STOP_ID")
  29 - access("WDA"."DELIVERY_ID"="DL"."DELIVERY_ID")
       filter("WDA"."DELIVERY_ID" IS NOT NULL)

Yet another Query Plan Change ! Cost Based Optimization…

Optimizer relies on Statistics to compute an Optimal Plan, which includes : Choice between Index Scan or Table Scan, in case of Index Scan, which index to choose, and more importantly, the Join Order and Join Type. Therefore, it is quite evident that accurate statistics means a good and accurate estimation of Cardinality, which in turn, results in optimal execution plan. But there are cases, when accurate statistics is not good enough for an Optimizer to come out with an Optimal plan. In this blog, I will post a real life issue which is an interesting case, in which, the Optimizer generated plan, impacted the performance of a query whereas the underlying object statistics were accurate. Accurate statistics means the the statistics that reflect the accurate picture of the data in the underlying table.

In many of the events that I presented, one of the common statement that I emphasize is “Differentiate Performance Symptom and Performance Problem”. In this case too, the inhouse team was investigating and focussing more on the Symptom while the problem was somewhere else. We will walk through this as well.

Recently, at a customer site, the performance of one of a critical performance, that runs every few minutes, got impacted. The inhouse adminitrator time could investigate that execution time of one particular query, that usually takes less than a second, changed to 30-40 minutes. Fortunately, the customer maintains Query Baselines and this helped them further delve into the issue and they observed that the plan was changed. While they were investigating the issue and the reason for this plan change, the issue was temporarily fixed using SQL Plan Management. The customer wanted to know the reason for this behaviour, so as to avoid, this in future.

This was a custom process. The original query and the good / bad plan is pasted below.

SELECT WIP_ENTITY_NAME FROM WIP_DJ_CLOSE_TEMP 
WHERE	WIP_ENTITY_ID IN
	(SELECT wdct.WIP_ENTITY_ID            
	FROM	WIP_TRANSACTIONS wt,
		WIP_DJ_CLOSE_TEMP wdct             
	WHERE	wdct.GROUP_ID = :b1             
	AND	wdct.ORGANIZATION_ID = :b2
	AND	wdct.WIP_ENTITY_ID = wt.WIP_ENTITY_ID             
	AND	wt.ORGANIZATION_ID = :b3             
	AND	wt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE   
	UNION    
	SELECT wdct.WIP_ENTITY_ID            
	FROM	MTL_MATERIAL_TRANSACTIONS mmt,
		WIP_DJ_CLOSE_TEMP wdct             
	WHERE	wdct.GROUP_ID = :b1             
	AND	wdct.ORGANIZATION_ID = :b2
	AND	wdct.WIP_ENTITY_ID = mmt.TRANSACTION_SOURCE_ID            
	AND	mmt.TRANSACTION_SOURCE_TYPE_ID = 5             
	AND	mmt.ORGANIZATION_ID= :b3
	AND mmt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE)

Good Plan

Plan hash value: 2500423228

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                              |       |       |  6017 (100)|          |       |       |
|   1 |  NESTED LOOPS                           |                              |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                          |                              |    70 |  1960 |  6017   (1)| 00:00:36 |       |       |
|   3 |    VIEW                                 | VW_NSO_1                     |    70 |   910 |  5947   (1)| 00:00:35 |       |       |
|   4 |     SORT UNIQUE                         |                              |    70 |  3040 |  5947  (92)| 00:00:35 |       |       |
|   5 |      UNION-ALL                          |                              |       |       |            |          |       |       |
|   6 |       NESTED LOOPS                      |                              |       |       |            |          |       |       |
|   7 |        NESTED LOOPS                     |                              |    45 |  1890 |   490   (1)| 00:00:03 |       |       |
|*  8 |         TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |  1920 |     9   (0)| 00:00:01 |       |       |
|*  9 |          INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   797 |       |     2   (0)| 00:00:01 |       |       |
|* 10 |         INDEX RANGE SCAN                | WIP_TRANSACTIONS_N1          |    24 |       |     0   (0)|          |       |       |
|* 11 |        TABLE ACCESS BY INDEX ROWID      | WIP_TRANSACTIONS             |     1 |    18 |     6   (0)| 00:00:01 |       |       |
|  12 |       NESTED LOOPS                      |                              |       |       |            |          |       |       |
|  13 |        NESTED LOOPS                     |                              |    25 |  1150 |  5455   (1)| 00:00:32 |       |       |
|* 14 |         TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |  1920 |     9   (0)| 00:00:01 |       |       |
|* 15 |          INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   797 |       |     2   (0)| 00:00:01 |       |       |
|  16 |         PARTITION RANGE ITERATOR        |                              |     2 |       |    67   (0)| 00:00:01 |   KEY |    36 |
|* 17 |          INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |       |    67   (0)| 00:00:01 |   KEY |    36 |
|* 18 |        TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |    22 |    68   (0)| 00:00:01 |     1 |     1 |
|* 19 |    INDEX UNIQUE SCAN                    | WIP_DJ_CLOSE_TEMP_U1         |     1 |       |     0   (0)|          |       |       |
|  20 |   TABLE ACCESS BY INDEX ROWID           | WIP_DJ_CLOSE_TEMP            |     1 |    15 |     1   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------------

Bad Plan

Plan hash value: 821054275

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name                         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                              |       |       |  7295 (100)|          |       |       |
|   1 |  NESTED LOOPS                           |                              |    19 |   532 |  7295   (1)| 00:00:43 |       |       |
|   2 |   VIEW                                  | VW_NSO_1                     |    19 |   247 |  7276   (1)| 00:00:43 |       |       |
|   3 |    SORT UNIQUE                          |                              |    19 |   826 |  7276  (88)| 00:00:43 |       |       |
|   4 |     UNION-ALL                           |                              |       |       |            |          |       |       |
|*  5 |      TABLE ACCESS BY INDEX ROWID        | WIP_TRANSACTIONS             |     1 |    18 |     6   (0)| 00:00:01 |       |       |
|   6 |       NESTED LOOPS                      |                              |    12 |   504 |   930   (1)| 00:00:06 |       |       |
|*  7 |        TABLE ACCESS BY INDEX ROWID      | WIP_DJ_CLOSE_TEMP            |   152 |  3648 |    16   (0)| 00:00:01 |       |       |
|*  8 |         INDEX RANGE SCAN                | WIP_DJ_CLOSE_TEMP_N1         |  1522 |       |     4   (0)| 00:00:01 |       |       |
|*  9 |        INDEX RANGE SCAN                 | WIP_TRANSACTIONS_N1          |    24 |       |     0   (0)|          |       |       |
|* 10 |      HASH JOIN                          |                              |     7 |   322 |  6344   (1)| 00:00:37 |       |       |
|* 11 |       TABLE ACCESS BY INDEX ROWID       | WIP_DJ_CLOSE_TEMP            |   152 |  3648 |    16   (0)| 00:00:01 |       |       |
|* 12 |        INDEX RANGE SCAN                 | WIP_DJ_CLOSE_TEMP_N1         |  1522 |       |     4   (0)| 00:00:01 |       |       |
|  13 |       PARTITION RANGE ALL               |                              | 25149 |   540K|  6327   (1)| 00:00:37 |     1 |    36 |
|  14 |        TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    | 25149 |   540K|  6327   (1)| 00:00:37 |     1 |    36 |
|* 15 |         INDEX RANGE SCAN                | MTL_MATERIAL_TRANSACTIONS_N8 | 25149 |       |   207   (1)| 00:00:02 |     1 |    36 |
|  16 |   TABLE ACCESS BY INDEX ROWID           | WIP_DJ_CLOSE_TEMP            |     1 |    15 |     1   (0)| 00:00:01 |       |       |
|* 17 |    INDEX UNIQUE SCAN                    | WIP_DJ_CLOSE_TEMP_U1         |     1 |       |     0   (0)|          |       |       |
----------------------------------------------------------------------------------------------------------------------------------------

Before diving into the analysis, let us compare both the plans and this will be from the customer’s perspective. They compared the plans and noticed that the Tables and Indexes selected by the optimizer, in both the cases, are same except for MTL_MATERIAL_TRANSACTION Table. The good plan uses N2 Index while the bad plan uses N8. Therefore, the customer hinted the query with N2 index and fixed the plan using SPM. Once fixed, they started their investigation on the reason for this change from N2 and N8. As mentioned earlier, this was just a Symptom while the underlying problem was something else. This something triggered the change in Index Selection and we will investigate this next.

Analysis

Comparing both the plans, it was now evident that the problem is with the second part of the Query, therefore, I took this part out for investigation. This is now a simple and small query, as show below :

SELECT wdct.WIP_ENTITY_ID            
FROM	apps.MTL_MATERIAL_TRANSACTIONS mmt,
	wip.WIP_DJ_CLOSE_TEMP wdct             
WHERE	wdct.GROUP_ID = :b1             
AND	wdct.ORGANIZATION_ID = :b2
AND	wdct.WIP_ENTITY_ID = mmt.TRANSACTION_SOURCE_ID            
AND	mmt.TRANSACTION_SOURCE_TYPE_ID = 5             
AND	mmt.ORGANIZATION_ID= :b3             
AND	mmt.TRANSACTION_DATE > wdct.ACTUAL_CLOSE_DATE

The another change that I noticed here was the change of Nested Loop Join to Hash Join. The customer had a clone environment of this database which was 2 months old and I executed this query on the clone enviroment. The Query executed in less than a second on this and therefore, the investigation wa further made simpler for me. The Query plan on the clone and production is pasted below :

Production Plan : BAD
Plan hash value: 3088015387

------------------------------------------------------------------------------------
| Id  | Operation                           | Name                         | Rows  |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                              |       |
|*  1 |  HASH JOIN                          |                              |    49 |
|*  2 |   TABLE ACCESS BY INDEX ROWID       | WIP_DJ_CLOSE_TEMP            |   155 |
|*  3 |    INDEX RANGE SCAN                 | WIP_DJ_CLOSE_TEMP_N1         |  1552 |
|   4 |   PARTITION RANGE ALL               |                              | 24662 |
|   5 |    TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    | 24662 |
|*  6 |     INDEX RANGE SCAN                | MTL_MATERIAL_TRANSACTIONS_N8 | 24662 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID")
       filter("MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   2 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   3 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5 AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID)

Clone DB Plan : GOOD
Plan hash value: 569619275

-----------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |       |
|   1 |  NESTED LOOPS                      |                              |       |
|   2 |   NESTED LOOPS                     |                              |    26 |
|*  3 |    TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    80 |
|*  4 |     INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   798 |
|   5 |    PARTITION RANGE ITERATOR        |                              |     2 |
|*  6 |     INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |
|*  7 |   TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   4 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID" AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID AND
              "MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   7 - filter("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5)

Notice the estimated rows calculated by the optimizer for the driving Index and Table. In clone db, the optimizer says 798 rows from an Index TEMP_N1 and then 80 Rows from the Table Access component. On production, the optimizer computes 1552 rows from the driving Index TEMP_N1 and then 155 rows from the Table Access. The optimizer further computes that 155 Iteration of MTL_MATERIAL_TRANSACTION (Nested Loop) will be a costly affair and therefore opts for a Hash Join. Since the clone db was 2 months old, the statistics were also old. Once I was convinced that estimated cardinality for the driving table seems to be an issue, I imported the production stats to clone and simulated the issue. I imported the stats of only the WIP_DJ_CLOSE_TEMP table on clone and could simulate the performance impact. The Table and relevant Column Level statistics of WIP_DJ_CLOSE_TEMP is pasted below. I have removed unwanted column stats from this post. This is from the clone and production.

Clone Stats - Performance Optimal

SQL> @table_stats
Enter value for 1: WIP_DJ_CLOSE_TEMP
old   2: where table_name='&1'
new   2: where table_name='WIP_DJ_CLOSE_TEMP'

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO T
------------------------------ --- ---------- ---------- --------- --- -
WIP                            NO      118836        829 26-SEP-10 YES N

Elapsed: 00:00:00.23
SQL> @column_stats
Enter value for 1: WIP
old   3: where owner='&1'
new   3: where owner='WIP'
Enter value for 2: WIP_DJ_CLOSE_TEMP
old   4: and   table_name='&2'
new   4: and   table_name='WIP_DJ_CLOSE_TEMP'

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                149          0 .006711409 NONE
ORGANIZATION_ID                          10          0         .1 NONE

Production Stats - Performance sub-optimal

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
WIP                            NO      116383        829 28-NOV-10 YES

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                 75          0 .013333333 NONE
ORGANIZATION_ID                          10          0         .1 NONE

Index Details of MTL_MATERIAL_TRANSACTIONS

INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------------------------ ---------------
MTL_MATERIAL_TRANSACTIONS_N2   TRANSACTION_SOURCE_ID                        1
                               ORGANIZATION_ID                              2
                               TRANSACTION_DATE                             3

MTL_MATERIAL_TRANSACTIONS_N8   TRANSACTION_SOURCE_TYPE_ID                   1
                               ORGANIZATION_ID                              2
                               TRANSACTION_DATE                             3
                               TRANSACTION_SOURCE_NAME                      4

Issue

As mentioned at the start of this post, while the statistics are relevant for selection between an Index Scan and Table Scan and in case of Index Scan, which Index to select; the most important factor is the Computed Cardinality of the driving table as this has an impact on the Join Order and Join Type. Further, the Join Type dictates the selection of an Index for the other tables. The difference between the Nested Loop Join and Hash Join, in terms of Index Selection is : Nested Loop Join will select rows from the driving table and then use an Index on the Join Column, if it exists (else it will be an hash join), of the second table. Therefore, in case of a good plan, with the Nested Loop Join in place, the optimizer opted for N2 Index of MTL_MATERIAL_TRANSACTION table which is an Index on Join Column. In case of Hash Join, the Optimizer will try to filter out as many rows as possible for both the tables using an Index on filter predicates and once the rows are filtered, will join these using hash function on the Join Columns. Therefore, in case of the Hash Join, the index N8 of MTL_MATERIAL_TRANSACTION is on filter column and then the rows filtered were joined using hash function on the Join Column. See the PREDICATE INFORMATION for both the Good and Bad Plans above and the Index Column details for MTL_MATERIAL_TRANSACTION pasted above.

The Calculation : Clone Old Stats

Let us get into the computed cardinality calculation for the driving table, as this had an impact on the Join Type. On clone the number of Rows are : 118836 and the WHERE predicate for this table is as under :

WHERE	wdct.GROUP_ID = :b1             
AND	wdct.ORGANIZATION_ID = :b2

The Table has an Index on GROUP_ID column and the calculation says 118836*1/149 = 797.55, rounded off to 798. Further, it has to filter based on non-indexed column, which is Organization_ID and the calculation says 798*1/10=79.8 rounded off to 80. The clone db (good) plan shows the same.

The Calculation : Production Stats

On Production, the computed cardinality calculation for the index scan step is 116383*1/75 = 1551.77, rounded off to 1552. Further, the filter based on non-indexed column is 1552*1/10 = 155.2, rounded off to 155. The production plan shows the same.

In the case of production statistics, the statistics were collected on Sunday and this triggered the change in plan. Further, these stats were accurate, even then, the optimizer came out with a plan that was sub-optimal. As mentioned, we could simulate the issue on Clone DB after importing the statistics of the driving table – WIP_DJ_CLOSE_TEMP. As a resolution and to check the impact, we modified the column level statistics of GROUP_ID column on clone db to 200 and the plan changed to good.

As the name of the driving table implies – WIP_DJ_CLOSE_TEMP, this is an interface table and the rows are deleted and inserted into this table. The table usually contain More or Less 0.1 Million Rows. Therefore, as a final action, once the column level statistics for GROUP_ID column was fixed on production, we locked the statistics of this table. This was to ensure that the plan of this query will not be impacted after periodic statistics generation and remains stable.

After manually setting the column level statistics for GROUP_ID column to 200, the plan is as under :

Plan hash value: 569619275

-----------------------------------------------------------------------------------
| Id  | Operation                          | Name                         | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                              |       |
|   1 |  NESTED LOOPS                      |                              |       |
|   2 |   NESTED LOOPS                     |                              |    18 |
|*  3 |    TABLE ACCESS BY INDEX ROWID     | WIP_DJ_CLOSE_TEMP            |    58 |
|*  4 |     INDEX RANGE SCAN               | WIP_DJ_CLOSE_TEMP_N1         |   582 |
|   5 |    PARTITION RANGE ITERATOR        |                              |     2 |
|*  6 |     INDEX RANGE SCAN               | MTL_MATERIAL_TRANSACTIONS_N2 |     2 |
|*  7 |   TABLE ACCESS BY LOCAL INDEX ROWID| MTL_MATERIAL_TRANSACTIONS    |     1 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("WDCT"."ORGANIZATION_ID"=:ORGANIZATION_ID)
   4 - access("WDCT"."GROUP_ID"=:GROUP_ID)
   6 - access("WDCT"."WIP_ENTITY_ID"="MMT"."TRANSACTION_SOURCE_ID" AND "MMT"."ORGANIZATION_ID"=:ORGANIZATION_ID AND
              "MMT"."TRANSACTION_DATE">"WDCT"."ACTUAL_CLOSE_DATE")
   7 - filter("MMT"."TRANSACTION_SOURCE_TYPE_ID"=5)

Production Stats - Performance sub-optimal

OWNER                          PAR   NUM_ROWS     BLOCKS LAST_ANAL GLO
------------------------------ --- ---------- ---------- --------- ---
WIP                            NO      116383        829 28-NOV-10 YES

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------- ---------------
GROUP_ID                                200          0 .013333333 NONE
ORGANIZATION_ID                          10          0         .1 NONE

The calculation now stands as Index Step : 116383*1/200 = 581.91, rounded off to 582. Table Access : 582*1/10 = 58.2, rounded off to 58. This resolved the issue.

In this scenario, Investigating the statistics on MTL_MATERIAL_TRANSACTIONS table would have resulted futile,as this change was just a symptom due to the problem with stats on the driving table.

Cost based Optimizer ! Inefficient input yields inefficient Output…

Since the time my old blog went inaccessible, and still is, I have been receiving several mails on my previous blogs that helped some of my readers relate and resolve their performance issues. They wanted access to my previous blogs and I have been requesting them to let me know any particular blog that was of interest and I may publish them right here. There were many requests to publish blogs on Cost Based Optimizer and therefore, thought of publishing some of them. While these blogs may not be relevant for the current versions of Oracle, there are still some production systems that are still on Oracle 8i and 9i, and these blogs may be of help for those instances. Wherever any of the Cost Based Optimization blog that are relevant for certain versions of Oracle, I will specifically mention them for the benefits of the readers.

This blog is relevant for Oracle 8i and 9i (without System Statistics)

Cost Based Optimizer has always been a mystery for most of the Database Administrators and Developers. The calculation made by the Optimizer to come-up with an Execution Plan has been doubted by many of the Oracle users, especially, when (to name a few) :

  1. The Application Performance Degrades after Version Upgrade
  2. Plan of a Query change after gathering of fresh Statistics, and
  3. The Performance of a Query with higher cost is much better that the plan with a least cost

Cost Based Optimizer has improved a lot in previous few versions. Therefore, it can be said that CBO is still undergoing some enhancements. With these enhancements, Oracle Users have accepted the fact that the Application Performance might degrade after version upgrade and leave no room with rigorous testing of the application before production upgrade. It is fact no.2 and 3 that raises a doubt over Cost Based Optimization.

Cost Based Optimizer is a piece of code, wherein, you provide Inputs, by way of Optimizer Parameters, Statistics and it computes and generates an Output which is seen as an Execution Plan. It is not only the Optimizer Parameters and Statistics, which is passed as an Input. There is one more component, which is mostly overlooked, a QUERY. The initial input passed is the Query and based on the way it is written, optimizer does its cost calculation by taking into account the other two : Parameters and Statistics. In this blog, I will address the last two facts stated above, plus, a common misconception that Index Scan is better that Full Table Scan and finally, the issue with the critical first component passed as an input to CBO, A Query.

Recently, was working on a performance issue, related to a critical process. The overall performance of the system was OK, but one critical process started taking 18-20 hours, failing several times with ora-1555. This process, usually takes 1 hour 30 minutes. Therefore, all of a sudden, why this degradation ?, was the question raised by the customer.

The Database version is 8174. This is one of a critical process and a part of seeded application. It connects to a user, dedicated for this process, therefore, when this process is running, it is very easy to get the sid of the session executing this process. While monitoring the performance of this process, by querying v$session_wait for the sid, I could see one particular query doing exorbitant amount of Logical Reads. The text of the query and execution plan is as under :

explain plan for
select CS.CO_ID ,CS.SPCODE ,CS.SNCODE ,CS.TMCODE ,CS.CS_SEQNO ,CS.CS_STAT_CHNG ,
to_char(CS.TRIAL_END_DATE,'YYYYMMDDHH24MISS')
from       CR_SERVICES CS
where      (CS.CO_ID not  in (select CO_ID
                             from CO_ALL
                             where (CO_ID=CS.CO_ID and CUSTOMER_ID=DEALER_ID))
and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (SNCODE=CS.SNCODE and RATING_IND='Y')));

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 141575 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   NESTED LOOPS            |                    |     1K|   77K| 141575 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS BY INDEX R|CR_SERVICES         |     3M|  120M|   1913 |       |       |
|     INDEX RANGE SCAN      |I1_CR_SERVICES      |     3M|      |   1616 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

There are three tables in the query. The Driving Table, from the explain plan, MPV is a small table with 220 rows and 80 Blocks, while the other two are huge tables with 69 Million and 30 Million Rows. Usually, such situation can crop-up due to following reasons :

  1. Change in Optimizer Parameters
  2. Change in Statistics
  3. Creation of New Index

Initial Investigation revealed that the Statistics of CR_SERVICES table and indexes were gathered the previous night. Once this was confirmed, Cost Based Optimization was at the receiving end. This is quite natural as without any change in the application query, the plan changed for bad, due to CBO. Therefore, thorough investigation was required about the cause of this plan change. Seeing the query and other optimizer statistics, Full Table Scan on CR_SERVICES would have been a better choice and therefore, my investigation was around this. Let us revisit the relevant portion of the Execution Plan, the statistics and the Query.

## Table Statistics of CR_SERVICES

OWNER                          PAR   NUM_ROWS     BLOCKS 
------------------------------ --- ---------- ---------- 
SM                             NO    69875510    1107733 

## Column Stats of the Columns used on the Query

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS    DENSITY
------------------------------ ------------ ---------- ----------
CO_ID                               6863377          0 1.4570E-07
SNCODE                                  370          0 .002702703  @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
|   NESTED LOOPS            |                    |     1K|   77K| 141575 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS BY INDEX R|CR_SERVICES         |     3M|  120M|   1913 |       |       |
|     INDEX RANGE SCAN      |I1_CR_SERVICES      |     3M|      |   1616 |       |       |

The Query is using an Index on SNCODE column, which is a very low cardinality column. Therefore, this was my primary target and I worked on the Optimizer Calculation to get the possible cause of this Index Scan.

Optimizer Calculation


The Cost Calculation of an Index Scan and Table Scan via Index is

Index Cost = Blevel+ceil(leaf_blocks*ix_selectivity) 
Table Scan = ceil(clustering_factor*table_selectivity) 

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370) from dual;
Index Cost = 1616 ## This matches the execution plan
Table Scan = 109533
Final Cost = 1616+109533=111149

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370*1/370) from dual;
Index Cost = 1616
Table Scan = 297
Final Cost = 1616+297=1913 ## This Matches with the Cost calculated by the Optimizer

This additional 1/370, was due to the fact that SNCODE column is used twice in the query. One for Index Filter, therefore considered only once for an index cost, and twice for Table cost due to SNCODE=CS.SNCODE join. The final Nested Loop Cost is 141575 and this is calculated as

NL Cost = Cost of an Outer Table + (Cost of Inner Table * Cardinality of Outer Table)
NL Cost = 13 + (1913*74)
NL Cost = 141575

Cost of Full Table Scan = Num Blocks / 6.59 (8k block size)
Cost of Full Table Scan = 1107733 / 6.59
Cost of Full Table Scan = 168093.02

Clearly, the cost of a Full Table Scan is higher than the Nested Loop Join and optimizer opted for the cheaper costed plan, which is, a Nested Loop Join and this requires an Index Scan of an Inner Table. When I checked the execution plan of this query with a FULL hint on CR_SERVICES, the FTS cost was 168200. I am yet to figure out the difference between my calculation and optimizer calculation, but for investigation purpose, I considered the calculation done by the optimizer. Based on this calculation, I came out with three possible solutions, these are :

  1. Modify the Query (Immediately Not possible, as it is a seeded query)
  2. Change Optimizer Statistics
  3. Change Optimizer Parameters

The first optimization was a better and safe approach, but was not possible immediately. The other two required some testing, as change in Optimizer Statistis can have negative impact on other queries using these objects and, change in optimizer parameters at db level can have an global impact on the overall performance. As mentioned earlier, this process connects to a user dedicated for this process, therefore, ON-LOGON trigger with the Optimizer Parameter change at session level was thought of as a better option without impacting overall application performance. The parameter, I could immediately think of was, OPTIMIZER_INDEX_COST_ADJ. This parameter was default i.e.100. For this query, to force Index Cost be Costlier than FTS, the calculation is OICA = (FTS Cost/NL Cost)*100 = (168200/141575)*100 = 118.80. Any value above this, would compute the NL cost be costlier than Full Table Cost and would opt for FTS and Hash Join. Initially, the Change was done at session level and the plan changed.

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 168219 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   HASH JOIN               |                    |     1K|   77K| 168219 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS FULL      |CR_SERVICES         |     3M|  120M| 168200 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

Finally, ON-LOGON trigger was created to set this parameter to 120, whenever, this user logged on to the database. The process, when re-executed, completed in 1 Hour 15 minutes and the desired performance was achieved.

Optimizer Calculation with OICA as 120


NL Table Scan Cost = (Current Cost of Inner Table Scan * Outer Table Card)*OICA/100
NL Table Scan Cost = (1913*74)*120/100
NL Table Scan Cost = (1913*74)*120/100
NL Table Scan Cost = 169874.4 > 168200 (FTS)

With this setting, the plan changed. Now for the reverse engineering. I wanted to check for any of the Index Statistics, say Clustering Factor, that would have calculated NL Join costlier than Full Table Scan. The Cost of NL Join with OICA is 169874.4. The cost of Table Scan via Index should be round(169874.4/74), which is 2296. Index Scan Cost is 1616, so the Table Scan cost component should be 2296-1616 is 680. Based on this, I calculated that if the Clustering Factor is changed to 93092000, then without any parameter setting, the query would go for a Full Table Scan. 

SQL> exec dbms_stats.SET_INDEX_STATS('SM','I1_CR_SERVICES',CLSTFCT=>93092000);

PL/SQL procedure successfully completed.

SQL> @index_stats
Enter value for o: SM
old   3: where table_owner='&O'
new   3: where table_owner='SM'
Enter value for t: CR_SERVICES
old   4: and   table_name='&T'
new   4: and   table_name='CR_SERVICES'

INDEX_NAME                       NUM_ROWS     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
------------------------------ ---------- ---------- ----------- ------------- -----------------
I1_CR_SERVICES                   70724121          3      596745      11207514          93092000 

12 rows selected.

explain plan for
select CS.CO_ID ,CS.SPCODE ,CS.SNCODE ,CS.TMCODE ,CS.CS_SEQNO ,CS.CS_STAT_CHNG ,
to_char(CS.TRIAL_END_DATE,'YYYYMMDDHH24MISS')
from       CR_SERVICES CS
where      (CS.CO_ID not  in (select CO_ID
                             from CO_ALL
                             where (CO_ID=CS.CO_ID and CUSTOMER_ID=DEALER_ID))
and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (SNCODE=CS.SNCODE and RATING_IND='Y')));

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 168219 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   HASH JOIN               |                    |     1K|   77K| 168219 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS FULL      |CR_SERVICES         |     3M|  120M| 168200 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

11 rows selected.

Back to the problem, does it look like to be an Optimizer Issue ? This table has a very high growth rate. While the num_blocks value of a table grows faster, the statistics of Leaf Blocks, BLevel and Clustering Factor does not change significantly. This is enough to blow up the cost of a Full Table Scan. If you see the query and the relevant portion of the query, pasted above, the join on SNCODE is really not required as it uses an IN clause. This unwanted join, forced the optimizer to consider the Density of SNCODE twice in Table Scan cost calculation.

## Relevant portion of the Query and Plan

and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (SNCODE=CS.SNCODE and RATING_IND='Y')));

## Optimizer Calculation (1/370 is considered twice to get Table Scan cost of 297, thus making it cheaper).
Index Cost = Blevel+ceil(leaf_blocks*ix_selectivity) 
Table Scan = ceil(clustering_factor*table_selectivity) 

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370*1/370) from dual;
Index Cost = 1616
Table Scan = 297
Final Cost = 1616+297=1913 

If this portion of the query is modified as :


and        CS.SNCODE in (select distinct SNCODE  from MPV
                     where (RATING_IND='Y')));

## Optimizer Calculation, with this change
Index Cost = Blevel+ceil(leaf_blocks*ix_selectivity) 
Table Scan = ceil(clustering_factor*table_selectivity) 

select 3+ceil(596745*1/370) "Index Scan Cost", ceil(40527179*1/370) from dual;
Index Cost = 1616
Table Scan = 109533
Final Cost = 1616+109533= 111149

## With this Cost, the cost of NL Join would have been

NL Cost = Cost of an Outer Table + (Cost of Inner Table * Cardinality of Outer Table)
NL Cost = 13 + (111149*74)
NL Cost = 8225039 > 168200 (FTS)

One unwanted Join Condition made Cost based Optimizer to come up with an in-efficient plan. Therefore, as mentioned earlier, my Initial and Crucial input to the optimizer is a Query and if these are not optimally written, Cost Based Optimizer is bound to compute an sub-optimal plan. The Query, modified by removing unwanted join and no parameter change or statistics change, gave the desired and better plan (see below ).

explain plan for
select CS.CO_ID ,CS.SPCODE ,CS.SNCODE ,CS.TMCODE ,CS.CS_SEQNO ,CS.CS_STAT_CHNG ,
to_char(CS.TRIAL_END_DATE,'YYYYMMDDHH24MISS')
from       CR_SERVICES CS  
where      (CS.CO_ID not  in (select CO_ID
                               from CO_ALL
                               where (CO_ID=CS.CO_ID and CUSTOMER_ID=DEALER_ID))
and        CS.SNCODE in (select distinct SNCODE  from MPV
                       where (RATING_IND='Y')));

Explained.

SQL> @?/rdbms/admin/utlxpls

Plan Table
------------------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |                    |     1K|   77K| 168219 |       |       |
|  FILTER                   |                    |       |      |        |       |       |
|   HASH JOIN               |                    |     1K|   77K| 168219 |       |       |
|    TABLE ACCESS FULL      |MPV                 |    74 |  444 |     13 |       |       |
|    TABLE ACCESS FULL      |CR_SERVICES         |     3M|  120M| 168200 |       |       |
|   FILTER                  |                    |       |      |        |       |       |
|    INDEX RANGE SCAN       |IDX_CUST_CO_DEALER  |     1 |   14 |      2 |       |       |
--------------------------------------------------------------------------------

Many developers are of the Opinion that an Execution Plan with cheaper cost is better than a Plan with higher cost, and Index Scans are better than Full Table Scans. This blog demonstrates that these are just misconceptions and it is the application design and knowledge that should be taken into account while writing an effecient code.

To sum up, it is just not the Optimizer Statistics and Parameter that control the Cost Based Optimizer, Queries play an important role too. A well written code should never bother or doubt about Cost Based Optimization.

Cost Based Optimization ! Query Performance Issue after creating a new Index…

Creating an Index to optimize performance of a Query is usually an adopted strategy by most of the Database Administrators or Application Vendors. While it may optimize the performance of the query for which it has been created, but can have an impact on other queries, as these may start picking up the newly created index and this new index may not be optimal for these queries. This is especially true where Indexing Strategy is not up to the mark. My first AIOUG Session at Hyderabad, I presented something on Proactive and Reactive Tuning. If Optimization is ignored during Application Development Life cycle, then these efforts are done in an reactive mode and this is when the database administrators or developers end up creating unwanted or more number of Indexes. With these redundant indexes in place, there is always a threat of performance impact after creating a new index and this is what transpired at one of my customer place.

Recently, the application team at one of my customer site introduced a new Batch Process that runs 2-3 times a day. After Implementation, they found the performance of this process not up-to-the mark and therefore created a new Index to optimize one of the resource intensive query of that process. After creating this new Index, the performance of the batch process improved, but few other queries started using this newly created index, with the sessions waiting on “read by other session”. Now, the customer was in a big dilemma. If they drop the new index, the batch process is impacted and if it is created, 2-3 other queries are impacted. When this was escalated to the Application Vendor, they washed off their hands stating this to be an Oracle Issue. In my opinion, this seems to be an application issue and as you read further you will come to know the reason for this, but the issue can also be attributed to Oracle Optimizer. Why the Optimizer chose a newly created index, when the performance of the query was optimal with the existing index ? Let us walk through a detailed issue and the analysis.

After Implementing a new Batch process, Query No.1 below was doing a Full Table Scan of a 80 GB table. To optimize this query and improve the completion time of the Batch process, an Index BLINE_INDX was created. There was already an Index N_T_ADVICE_DTL_IDX3 on the CASEID column as a leading column, but the optimizer chose not to use this Index, may be since this column has been used as a Range Predicate. Instead of Investigation the reason for not using this Index, the Application Vendor reverted with a new index BLINE_INDX and therefore this index was created. Once this new Index was created, the execution plan of Query No.2, which was optimally using N_T_ADVICE_DTL_IDX3 ceased to use this index and started using the New Index BLINE_INDX. While there were several more indexes on this table, I have considered only two indexes that are relevant for this posting.

Query No.1 (BLINE PROCESS)
--------------------------

SELECT ROWID,CASEID,TO_CHAR(ADVICEDATE,'YYYYMMDD'),TXNADVICEID,CHARGEID,TXNID,
TXNTYPE,ADVICETYPE,ADVICEAMT,ORIGNALAMT,STATUS,DESCRIPTION,REMARKS,BPTYPE 
FROM 	flea.N_T_ADVICE_DTL 
WHERE 	caseid BETWEEN :1 AND :2 
AND 	ADVICEDATE BETWEEN :3 AND :4 
AND 	status='A' 
ORDER BY CASEID, ADVICEDATE,  TXNADVICEID;

Query No.2
----------

SELECT SUM(txnadjustedamt) 
FROM	flea.n_t_advice_dtl 
WHERE	caseid = :b1 
AND	chargeid =93 
AND	status ='A';

Index Details
-------------

INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------------------------ ---------------
BLINE_INDX                     CASEID                                       1 <== Local
                               ADVICEDATE                                   2
                               STATUS                                       3

N_T_ADVICE_DTL_IDX3            CASEID                                       1 <== Local
                               CHARGEID                                     2

My services in this issue was sought when the Customer wanted the analysis on why Query 2 started picking up a new Index, while, both the columns of IDX3 index are present in Query 2 and therefore should be an obvious choice for the optimizer. This seems to be true and therefore, I started my analysis on this issue with a query to v$sql_bind_capture to get the actual production runtime value passed to the bind :1 by the application user. I executed this query and passed the bind value to a bind variable b1. The runtime plan of the query shows BLINE_INDX index being used and this matches the production issue.

SQL>@bind_capture
new   2: where sql_id='b7nkg6m692jgq'

NAME                             POSITION DATATYPE_STRING      VALUE_STRING
------------------------------ ---------- -------------------- ------------------------------
:1                                      1 NUMBER               2568494

SQL>variable b1 number;
SQL>exec :b1:=2568494;

SELECT /*+ VIVEK_2568494 */ SUM(txnadjustedamt) 
FROM	flea.n_t_advice_dtl 
WHERE	caseid = :b1 
AND	chargeid =93 
AND	status ='A';

Plan Table
============
------------------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                            | Name               | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
------------------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                     |                    |       |       |     4 |           |       |       |
| 1   |  SORT AGGREGATE                      |                    |     1 |    16 |       |           |       |       |
| 2   |   PARTITION RANGE SINGLE             |                    |     1 |    16 |     4 |  00:00:01 | KEY   | KEY   |
| 3   |    TABLE ACCESS BY LOCAL INDEX ROWID | N_T_ADVICE_DTL     |     1 |    16 |     4 |  00:00:01 | KEY   | KEY   |
| 4   |     INDEX RANGE SCAN                 | BLINE_INDX         |    47 |       |     1 |  00:00:01 | KEY   | KEY   |
------------------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
3 - filter("CHARGEID"=:SYS_B_0)
4 - access("CASEID"=:B1 AND "STATUS"=:SYS_B_1)
4 - filter("STATUS"=:SYS_B_1)

This being a partitioned table, the value 2568494 falls under the high_value of Partition 3 of the table. Therefore, the statistics for this partition was referred for the calculation and analysis. Shown below are the Optimizer Statistics for Partition 3 and the calculation that optimizer does to derive a cost of an Index Scan.


## Calculation for Partition 3
-------------------------------
TABLE_OWNER  PARTITION_NAME           HIGH_VALUE      NUM_ROWS     BLOCKS PARTITION_POSITION
------------ ------------------------ ------------- ---------- ---------- ------------------
FLEA         N_T_ADVICE_DTL_FO_P3     3000000         90588615    2019809                  3

Partition Column Stats
-----------------------
## Stats for Partition 3 based on Bind Capture : 2568494

COLUMN_NAME     NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM
--------------- ------------ ---------- ---------- ---------------
CASEID                954287          0 1.0479E-06 NONE
CHARGEID                 105          0  .00952381 NONE
STATUS                     2          0         .5 NONE

Partition Index Stats
---------------------
INDEX_NAME                PARTITION_NAME                     BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   AVDPK      ALBKP
------------------------- ------------------------------ ---------- ----------- ------------- ----------------- ------- ----------
BLINE_INDX                N_T_ADVICE_DTL_FO_P3                    3      335600       8818906          46069060       5          1

N_T_ADVICE_DTL_IDX3       N_T_ADVICE_DTL_FO_P3                    3      293115       1638635          48089155      29          1

Once the data was collected, it was time to compute the cost of both the Indexes. The formula for computing the cost of an Index Scan and then Table Access by Index step is :

Cost of Index Scan (A) = BLEVEL + CEIL(LEAF_BLOCKS * INDEX SELECTIVITY)
Cost of Table Scan via Index (B) = CEIL(CLUSTERING_FACOR * INDEX SELECTIVITY WITH FILTERS)
Total Cost = A + B

Analysis & Cost Calculation

In case of BLINE_INDX Index, for the selectivity, only CASEID column is to be considered. Revisit the Execution Plan above and check for the Predicate Information. While CASEID and STATUS columns exists in this Index, Access_Predicate Information shows that only CASEID column is used for Index Scan and further STATUS column is used as a Filter_predicate. Therefore, Num_Distinct of CASEID has to be used to derive Index Selectivity and product of Num_Distinct of CASEID and STATUS has to be used to compute the Index Selectivity with Filters.

Similary, for N_T_ADVICE_DTL_IDX3, since both the columns present in the Index are used in the Query, the Selectivity (Index and with Filters) will have to be computed on the product of the two columns present i.e.CASEID and CHARGEID.

One point also to be noted here is that the parameter OPTIMIZER_INDEX_COST_ADJ has been set to 15 and this is also to be taken into consideration when computing the Final Cost.

Calculation For bline Index
---------------------------
select 3+ceil(335600*1/954287), ceil(46069060*1/954287*1/2) from dual;

Index Cost : 4
Table Access : 4+25 = 29

OICA = 29*15/100 = 4.35

Rounded Up to 4
----------------

For IDX3 Index
--------------

select 3+ceil(293115*1/954287*1/105), ceil(48089155*1/954287*1/105) from dual;

Index Cost : 4
Table Access : 4+1 = 5

OICA : 5*15/100 = 0.75

Rounded Up to 1
---------------

Based on this calculation, the cost of IDX3 seems to be cheaper and should have been an obvious choice. Why is Optimizer computing BLINE_INDX to be cheaper than IDX3 ? There’s another better place to compare our analysis with the Optimizer Computation and this is 10053 trace file. For this, the query needs to be hard parsed and therefore, with a minor change in the query, it was reexecuted for the same bind value and a trace file was generated.

Once the 10053 trace was generated, it was easier to get into the rationale behind Optimizer Choice of BLINE_INDX index for this query. Pasted below is the relevant content of the 10053 trace file. I have also underlined the relevant value that will be used for analysis and comparison.

Table Stats::
  Table: N_T_ADVICE_DTL  Alias: N_T_ADVICE_DTL  Partition [2]
    #Rows: 90588615  #Blks:  2019809  AvgRowLen:  164.00
Index Stats::
  Index: BLINE_INDX  Col#: 28 27 17  PARTITION [2]
    LVLS: 3  #LB: 335600  #DK: 8818906  LB/K: 1.00  DB/K: 5.00  CLUF: 46069060.00
  Index: N_T_ADVICE_DTL_IDX3  Col#: 28 3  PARTITION [2]
    LVLS: 3  #LB: 293115  #DK: 1638635  LB/K: 1.00  DB/K: 29.00  CLUF: 48089155.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#28): CASEID(NUMBER)  Part#: 2
    AvgLen: 6.00 NDV: 954287 Nulls: 0 Density: 1.0479e-06 Min: 2000000 Max: 2999998
  Column (#3): CHARGEID(NUMBER)  Part#: 2
    AvgLen: 4.00 NDV: 105 Nulls: 0 Density: 0.0095238 Min: 2 Max: 1000018
  Column (#17): STATUS(VARCHAR2)  Part#: 2
    AvgLen: 2.00 NDV: 2 Nulls: 0 Density: 0.5
  Table: N_T_ADVICE_DTL  Alias: N_T_ADVICE_DTL
    Card: Original: 90588615  Rounded: 1  Computed: 0.45  Non Adjusted: 0.45
  Access Path: TableScan
    Cost:  441312.32  Resp: 441312.32  Degree: 0
      Cost_io: 425225.00  Cost_cpu: 81419528497
      Resp_io: 425225.00  Resp_cpu: 81419528497
kkofmx: index filter:"N_T_ADVICE_DTL"."STATUS"=:B1
kkofmx: index filter:"N_T_ADVICE_DTL"."CHARGEID"=:B1 AND "N_T_ADVICE_DTL"."STATUS"=:B2
kkofmx: index filter:"N_T_ADVICE_DTL"."CASEID"=:B1 AND "N_T_ADVICE_DTL"."CHARGEID"=:B2 AND "N_T_ADVICE_DTL"."STATUS"=:B3
  Access Path: index (RangeScan)
    Index: BLINE_INDX
    resc_io: 29.00  resc_cpu: 195479
    ix_sel: 1.0703e-06  ix_sel_with_filters: 5.3516e-07
    Cost: 4.36  Resp: 4.36  Degree: 1
  Access Path: index (AllEqRange)
    Index: N_T_ADVICE_DTL_IDX3
    resc_io: 34.00  resc_cpu: 216124
    ix_sel: 6.1026e-07  ix_sel_with_filters: 6.1026e-07
    Cost: 5.11  Resp: 5.11  Degree: 1
  Best:: AccessPath: IndexRange  Index: BLINE_INDX
         Cost: 4.36  Degree: 1  Resp: 4.36  Card: 0.45  Bytes: 0

The Cost of BLINE_INDX (resc_io) matches our calculation. RESC_IO for BLINE_INDX is 29 and with OICA set to 15, the final cost is 4.36, rounded off to 4. Surprisingly, the Cost of IDX3 (resc_io), as per Optimizer is 34, as against our calculation of 5. Optimizer always chose a least costed plan and in this case, 29 against 34 and 4.36 agains 5.11 is cheaper, therefore Optimizer opted for BLINE_INDX Index. Now, the action plan was to check for the Calculation made by the Optimizer that derives the cost of IDX3 index to be 34. BLEVEL and CLUSTERING_FACTOR of the IDX3 index is unchanged and therefore, I doubted on the Selectivity part. The ix_sel (index_selectivity) and ix_sel_with_filters (Index Selectivity with Filters) for IDX3 index is 6.1026e-07. The Selectivity, as derived by me, was 1/NDV*1/NDV (for two columns). I decided to do some reverse engineering and therefore, calculated the value of 1/6.1026e-07 which is 1638645.82. Then I looked for a value closer to 1638645.92 and observed that the DISTINCT_KEYS of IDX3 is 1638635, which is very much close. Now, if you check the Index and Index with Filters Selectivity based on 1/1638635, then it exactly matches the ix_sel and ix_sel_with_filters value of 10053 trace. Further, note the line Access Path: index (AllEqRange) above Index: N_T_ADVICE_DTL_IDX3 in the trace file. AllEqRange means, all the columns of an Index has been used as an Equality predicate and therefore, instead of using individual selectivity of each columns, optimizer has considered the computed DISTINCT_KEYS of an Index and this has flawed the Optimizer Computation.

By this time, it was clear and evident that the flaw in this calculation is due to AllEqRange, it was time to implement a change that would resolve the issue. There were two solutions for this issue. These are :

  1. Modify the Clustering Factor for the problematic partitions.
  2. Add a new Column to IDX3 Index that is not used in the Problematic Query (query no.2)

I computed the cost for each of the partitions and observed that the clustering_factor for almost 70% of the index partitions will have to be modified and this means, re-calculating this everytime the stats are computed. Locking of Stats after modifying the Clustering_factor would have helped but then, Optimizer Behaviour Change and the risk of performance impact with other queries was on top of the mind and therefore, this option was not implemented. I tested this for one partition, where after modifying the Clustering_factor, the optimizer started picking up IDX3 index.

The other option was to modify the index by way of adding a new column. Again, in order to eliminate any potential risk after adding a third column, the best option implemented thought of and implemented was adding a column with single distinct value. Therefore, a column CURRENCYID with a single distinct value was appended to this index and was tested on UAT. Before making this change on the UAT, first we made sure that the issue is reproduced. Once we were sure that the similar issue exists on the UAT enviroment as well, we dropped and recreated the index on (CASEID, CHARGEID, CURRENCYID) and the change had a dramatic impact. With this change, the AllEqRange changed to RangeScan and the optimizer computed the cost based on Individual Selectivities of the two columns rather than DISTINCT_KEYS from dba_indexes. Pasted below is the relevant portion of 10053 trace generated on UAT post-index change.

Table Stats::
  Table: N_T_ADVICE_DTL  Alias: N_T_ADVICE_DTL Partition [2]
    #Rows: 25494530  #Blks:  10874  AvgRowLen:  165.00
Index Stats::
  Index: BLINE_INDX  Col#: 28 27 17  PARTITION [2]
    LVLS: 2  #LB: 92720  #DK: 8843108  LB/K: 1.00  DB/K: 1.00  CLUF: 11966280.00
  Index: N_T_ADVICE_DTL_IDX3  Col#: 28 3 18 PARTITION [2]
    LVLS: 2  #LB: 86050  #DK: 2764430  LB/K: 1.00  DB/K: 4.00  CLUF: 12732735.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#28): CASEID(NUMBER)
    AvgLen: 6.00 NDV: 406027 Nulls: 240 Density: 2.4629e-06 Min: 0 Max: 51734139
  Column (#3): CHARGEID(NUMBER)
    AvgLen: 4.00 NDV: 165 Nulls: 0 Density: 0.0060606 Min: 2 Max: 10000048
  Column (#17): STATUS(VARCHAR2)
    AvgLen: 2.00 NDV: 3 Nulls: 0 Density: 0.33333
  Table: N_T_ADVICE_DTL  Alias: N_T_ADVICE_DTL     
    Card: Original: 25494530  Rounded: 1  Computed: 0.13  Non Adjusted: 0.13
  Access Path: TableScan
    Cost:  2544.46  Resp: 2544.46  Degree: 0
      Cost_io: 2380.00  Cost_cpu: 414356148
      Resp_io: 2380.00  Resp_cpu: 414356148
kkofmx: index filter:"N_T_ADVICE_DTL"."STATUS"=:B1
kkofmx: index filter:"N_T_ADVICE_DTL"."CHARGEID"=:B1 AND "N_T_ADVICE_DTL"."STATUS"=:B2
kkofmx: index filter:"N_T_ADVICE_DTL"."CASEID"=:B1 AND "N_T_ADVICE_DTL"."CHARGEID"=:B2 AND "N_T_ADVICE_DTL"."STATUS"=:B3
  Access Path: index (RangeScan)
    Index: BLINE_INDX
    resc_io: 14.00  resc_cpu: 127837
    ix_sel: 2.5335e-06  ix_sel_with_filters: 8.4450e-07
    Cost: 2.11  Resp: 2.11  Degree: 1
  Access Path: index (RangeScan)
    Index: N_T_ADVICE_DTL_IDX3
    resc_io: 4.00  resc_cpu: 29395
    ix_sel: 1.4926e-08  ix_sel_with_filters: 1.4926e-08
    Cost: 1.00  Resp: 1.00  Degree: 1
  Best:: AccessPath: IndexRange  Index: N_T_ADVICE_DTL_IDX3
         Cost: 1.00  Degree: 1  Resp: 1.00  Card: 0.13  Bytes: 0

 
============
Plan Table
============
-----------------------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                            | Name                    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
-----------------------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                     |                         |       |       |     1 |           |       |       |
| 1   |  SORT AGGREGATE                      |                         |     1 |    15 |       |           |       |       |
| 2   |   PARTITION RANGE SINGLE             |                         |     1 |    15 |     1 |  00:00:01 | KEY   | KEY   |
| 3   |    TABLE ACCESS BY LOCAL INDEX ROWID | N_T_ADVICE_DTL          |     1 |    15 |     1 |  00:00:01 | KEY   | KEY   |
| 4   |     INDEX RANGE SCAN                 | N_T_ADVICE_DTL_IDX3     |     1 |       |     1 |  00:00:01 | KEY   | KEY   |
-----------------------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
3 - filter("STATUS"=:SYS_B_1)
4 - access("CASEID"=:B1 AND "CHARGEID"=:SYS_B_0)

The change is still pending for Implementation on production and will be done post month end. With the analysis, it seems that the issue should resolve post this change. If you have come across any such issues, please do share the findings. For the other 2 queries, I do have an interesting fact about Cost Based Optimizer, but unfortunately, have not collected artifacts and data to post on the blog. While the change in BLINE_INDX Index suggested by me has resolved these issues as well, for posting purpose, I am doing further study on this and will share the findings as and when I am ready with it.

%d bloggers like this: