Query Optimization ! An example of a better rewrite…
March 24, 2013 3 Comments
In January 2012, I wrote about a Query Performance Issue due to NVL(:b1,column_name) predicates in the WHERE clause. Due to these predicates, the Optimizer computed Incorrect Cardinality and came out with a Sub-Optimal Plan. In Sangam 2012, I demostrated an issue because of these predicates. This blog is again on one of the issue that a customer reported and was solved by way of a workaround. Since, this query is a seeded query, the permanent fix (suggested in this blog) is expected by way of an Application Patch.
I will demonstrate this on my SCOTT Schema and a Query on EMP table. This will be easier for me to explain as well. Based on the EMP table, the requirement is to write a report that takes 2 Inputs. These are EMPNO and DEPTNO. The users can run this report for any of the following conditions :
- EMPNO and DEPTNO are NOT NULL
- EMPNO IS NULL and DEPTNO IS NOT NULL
- EMPNO IS NOT NULL AND DEPTNO IS NULL
- EMPNO and DEPTNO are both NULL
The way Original Query is written, I assumed the Developers had above 4 requirements in mind. However, at the Customer place, the customer confirmed that only the 1st two conditions are applicable. Out of the total execution, 1st condition contributes to around 70% and 2nd contributes to 30%.
Back to our example on EMP table. With the 4 conditions in mind, any Developer would write a query as mentioned below.
select empno, ename, deptno, sal
from emp
where deptno = nvl(:deptno,deptno)
and empno = nvl(:empno,empno)
order by empno;
## Execution of this query for each of the combination
## Both are NOT NULL (For a Deptno & for an Employee)
exec :deptno:=30; :empno:=7499;
EMPNO ENAME DEPTNO SAL
---------- ---------- ---------- ----------
7499 ALLEN 30 1600
## EMPNO IS NULL (For a Deptno and all Employees)
exec :deptno:=30; :empno:=null;
EMPNO ENAME DEPTNO SAL
---------- ---------- ---------- ----------
7499 ALLEN 30 1600
7521 WARD 30 1250
7654 MARTIN 30 1250
7698 BLAKE 30 2850
7844 TURNER 30 1500
7900 JAMES 30 950
## DEPTNO IS NULL (For an Employee)
exec :deptno:=null; :empno:=7499;
EMPNO ENAME DEPTNO SAL
---------- ---------- ---------- ----------
7499 ALLEN 30 1600
## Both are NULL (for all Deptno and all Employees)
exec :deptno:=null; :empno:=null;
EMPNO ENAME DEPTNO SAL
---------- ---------- ---------- ----------
7369 SMITH 20 800
7499 ALLEN 30 1600
7521 WARD 30 1250
7566 JONES 20 2975
7654 MARTIN 30 1250
7698 BLAKE 30 2850
7782 CLARK 10 2450
7788 SCOTT 20 3000
7839 KING 10 5000
7844 TURNER 30 1500
7876 ADAMS 20 1100
7900 JAMES 30 950
7902 FORD 20 3000
7934 MILLER 10 1300
A single query meets the requirement for all the 4 combinations. The Developer, in this case, has done his job. However, they have not considered the fact that 2 out of 4 of the above combinations would end up doing a Full Table Scan of EMP table. In case of the customer case, since only the 1st two combinations are applicable and with 30% of the executions on combination 2, 30% of the time, the Optimizer would opt for a Full Table Scan. Before, we get into the Original case, let us check the runtime execution plan for the Query on EMP Table.
select empno, ename, deptno, sal from emp where deptno =
nvl(:deptno,deptno) and empno = nvl(:empno,empno) order by empno
Plan hash value: 3923409422
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT ORDER BY | | 7 | 322 | 6 (17)| 00:00:01 |
| 2 | CONCATENATION | | | | | |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL | EMP | 6 | 276 | 3 (0)| 00:00:01 |
|* 5 | FILTER | | | | | |
|* 6 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 46 | 2 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | PK_EMP | 1 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(:EMPNO IS NULL)
4 - filter(("EMPNO" IS NOT NULL AND "DEPTNO"=NVL(:DEPTNO,"DEPTNO")))
5 - filter(:EMPNO IS NOT NULL)
6 - filter("DEPTNO"=NVL(:DEPTNO,"DEPTNO"))
7 - access("EMPNO"=:EMPNO)
From the predicate information, the Optimizer choice becomes very clear, which is “IF :EMPNO is NULL then FTS of EMP and IF :EMPNO is NOT NULL then Table Access by Index Rowid and a Unique Scan of PK_EMP”. This means, if a User Inputs the value for an EMPNO (irrespective of what has been passed for DEPNO), then the Optimizer will always do an Index Scan.
Now the Original Case
The Original Query and the Plan is pasted below, before we discuss on this.
SELECT TRANSACTION_ID, TRANSACTION_TYPE, TRANSACTION_DATE, TAX_TRANSACTION_ID, PARENT_TRANSACTION_TYPE,
CURRENCY_CONVERSION_RATE, QUANTITY,
DECODE(TRANSACTION_TYPE, 'RECEIVE', 1, 'RETURN TO VENDOR', -1, 'CORRECT',
DECODE(PARENT_TRANSACTION_TYPE, 'RECEIVE', SIGN(QUANTITY), 'RETURN TO VENDOR',
SIGN(QUANTITY)*-1)) QUANTITY_MULTIPLIER
FROM JAI_RCV_TRANSACTIONS
WHERE SHIPMENT_LINE_ID = NVL(:B2 , SHIPMENT_LINE_ID)
AND TRANSACTION_ID = NVL(:B1 , TRANSACTION_ID)
AND ( TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') OR ( TRANSACTION_TYPE = 'CORRECT' AND PARENT_TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') ) )
ORDER BY TRANSACTION_ID;
Plan hash value: 4081860556
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 621K(100)| |
| 1 | SORT ORDER BY | | 2 | 94 | 621K (11)| 00:02:24 |
| 2 | CONCATENATION | | | | | |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL | JAI_RCV_TRANSACTIONS | 1 | 47 | 621K (11)| 00:02:24 |
|* 5 | FILTER | | | | | |
|* 6 | TABLE ACCESS BY INDEX ROWID| JAI_RCV_TRANSACTIONS | 1 | 47 | 3 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | JAI_RCV_TRANSACTIONS_PK | 1 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(:B1 IS NULL)
4 - filter(((INTERNAL_FUNCTION("TRANSACTION_TYPE") OR ("TRANSACTION_TYPE"='CORRECT' AND
INTERNAL_FUNCTION("PARENT_TRANSACTION_TYPE"))) AND "SHIPMENT_LINE_ID"=NVL(:B2,"SHIPMENT_LINE_ID")
AND "TRANSACTION_ID" IS NOT NULL))
5 - filter(:B1 IS NOT NULL)
6 - filter(((INTERNAL_FUNCTION("TRANSACTION_TYPE") OR ("TRANSACTION_TYPE"='CORRECT' AND
INTERNAL_FUNCTION("PARENT_TRANSACTION_TYPE"))) AND
"SHIPMENT_LINE_ID"=NVL(:B2,"SHIPMENT_LINE_ID")))
7 - access("TRANSACTION_ID"=:B1)
As mentioned earlier, in case of the Customer case, only the 1st two combinations are applicable. These are (columns from Original Query) :
- TRANSACTION_ID and SHIPMENT_LINE_ID are NOT NULL
- TRANSACTION_ID IS NULL and SHIPMENT_LINE_ID IS NOT NULL
There is a Unique Index on TRANSACTION_ID and another Unique Index on (SHIPMENT_LINE_ID,TRANSACTION_ID). This query is a part of a Transaction and for the cases, where the value to :B1 (Transaction_Id) is null, the transaction takes around 20 minutes to complete. The table has around 9 Million Rows and does a Full Table Scan for such cases.
Since the customer mentioned that in their case, value for TRANSACTION_ID column can only be passed as NULL, an option of changing “:B1 is NULL” to “:B2 is NULL” (see PREDICATE INFORMATION) seemed to be a better choice. Remember, :B1 is for TRANSACTION_ID and :B2 is for SHIPMENT_LINE_ID. This change will check, if :B2 is NULL then FTS on the Table and if :B2 is NOT NULL then an Index Scan on SHIPMENT_LINE_ID. The Uniqueness of TRANSACTION_ID is more than SHIPMENT_LINE_ID, and therefore, optimizer picks TRANSACTION_ID as more cheaper and opts for an Index Scan on this column. Below are the column statistics for the two columns.
COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM ------------------------------ ------------ ---------- ---------- --------------- SHIPMENT_LINE_ID 4351257 0 2.2982E-07 NONE TRANSACTION_ID 9092033 0 1.0999E-07 NONE
As a temporary workaround, the column statistics for the two columns were swapped. Before doing that, it was critically important to actually check the number of queries with TRANSACTION_ID column in the WHERE predicate. This is ensure that this temporary fix should not impact other queries. Once it was confirmed that this change will not impact, the column stats were swapped. This swapping was intended to see below mentioned change in the plan :
Original – if :B1 is NULL then FTS. If :B1 is NOT NULL then Unique Index Scan on TRANSACTION_ID
Changed – if :B2 is NULL then FTS. If :B2 is NOT NULL then Range Scan on SHIPMENT_LINE_ID
Since the value to :B2 is never passed as NULL, this change will also make sure this 9 Million Row table is never Full Scanned. The swapping of statistics worked and the plan changed for better. Some of the transaction that took around 20 Minutes started completing in a second.
exec dbms_stats.set_column_stats(ownname=>'JA',tabname=>'JAI_RCV_TRANSACTIONS',colname=>'TRANSACTION_ID',DISTCNT=>4351257);
exec dbms_stats.set_column_stats(ownname=>'JA',tabname=>'JAI_RCV_TRANSACTIONS',colname=>'SHIPMENT_LINE_ID',DISTCNT=>9092033);
Plan hash value: 3580471929
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 621K(100)| |
| 1 | SORT ORDER BY | | 2 | 94 | 621K (11)| 00:02:24 |
| 2 | CONCATENATION | | | | | |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL | JAI_RCV_TRANSACTIONS | 1 | 47 | 621K (11)| 00:02:24 |
|* 5 | FILTER | | | | | |
|* 6 | TABLE ACCESS BY INDEX ROWID| JAI_RCV_TRANSACTIONS | 1 | 47 | 4 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | JA_IN_RCV_TRANSACTIONS_N5 | 1 | | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(:B2 IS NULL)
4 - filter(((INTERNAL_FUNCTION("TRANSACTION_TYPE") OR ("TRANSACTION_TYPE"='CORRECT' AND
INTERNAL_FUNCTION("PARENT_TRANSACTION_TYPE"))) AND "TRANSACTION_ID"=NVL(:B1,"TRANSACTION_ID")
"SHIPMENT_LINE_ID" IS NOT NULL))
5 - filter(:B2 IS NOT NULL)
6 - filter(((INTERNAL_FUNCTION("TRANSACTION_TYPE") OR ("TRANSACTION_TYPE"='CORRECT' AND
INTERNAL_FUNCTION("PARENT_TRANSACTION_TYPE"))) AND "TRANSACTION_ID"=NVL(:B1,"TRANSACTION_ID")
7 - access("SHIPMENT_LINE_ID"=:B2)
While this was a temporary workaround to fix the problem, a better strategy would have been to write a query that take the advantage of the available indexes and works optimally for any of the 4 combinations mentioned above. A better query is as under :
select * from (
SELECT /*+ WITH BOTH BINDS AS NOT NULL */ TRANSACTION_ID, TRANSACTION_TYPE, TRANSACTION_DATE, TAX_TRANSACTION_ID, PARENT_TRANSACTION_TYPE,
CURRENCY_CONVERSION_RATE, QUANTITY,
DECODE(TRANSACTION_TYPE, 'RECEIVE', 1, 'RETURN TO VENDOR', -1, 'CORRECT',
DECODE(PARENT_TRANSACTION_TYPE, 'RECEIVE', SIGN(QUANTITY), 'RETURN TO VENDOR',
SIGN(QUANTITY)*-1)) QUANTITY_MULTIPLIER
FROM JAI_RCV_TRANSACTIONS
WHERE SHIPMENT_LINE_ID = :B2
AND TRANSACTION_ID = :B1
AND ( TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') OR ( TRANSACTION_TYPE = 'CORRECT' AND PARENT_TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') ) )
union all
SELECT /*+ WITH B1 AS NULL */ TRANSACTION_ID, TRANSACTION_TYPE, TRANSACTION_DATE, TAX_TRANSACTION_ID, PARENT_TRANSACTION_TYPE,
CURRENCY_CONVERSION_RATE, QUANTITY,
DECODE(TRANSACTION_TYPE, 'RECEIVE', 1, 'RETURN TO VENDOR', -1, 'CORRECT',
DECODE(PARENT_TRANSACTION_TYPE, 'RECEIVE', SIGN(QUANTITY), 'RETURN TO VENDOR',
SIGN(QUANTITY)*-1)) QUANTITY_MULTIPLIER
FROM JAI_RCV_TRANSACTIONS
WHERE SHIPMENT_LINE_ID = :B2 AND :B1 is NULL
AND ( TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') OR ( TRANSACTION_TYPE = 'CORRECT' AND PARENT_TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') ) )
union all
SELECT /*+ WITH B2 AS NULL */ TRANSACTION_ID, TRANSACTION_TYPE, TRANSACTION_DATE, TAX_TRANSACTION_ID, PARENT_TRANSACTION_TYPE,
CURRENCY_CONVERSION_RATE, QUANTITY,
DECODE(TRANSACTION_TYPE, 'RECEIVE', 1, 'RETURN TO VENDOR', -1, 'CORRECT',
DECODE(PARENT_TRANSACTION_TYPE, 'RECEIVE', SIGN(QUANTITY), 'RETURN TO VENDOR',
SIGN(QUANTITY)*-1)) QUANTITY_MULTIPLIER
FROM JAI_RCV_TRANSACTIONS
WHERE TRANSACTION_ID = :B1 AND :B2 IS NULL
AND ( TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') OR ( TRANSACTION_TYPE = 'CORRECT' AND PARENT_TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') ) )
UNION ALL
SELECT /*+ WITH BOTH BINDS AS NULL */ TRANSACTION_ID, TRANSACTION_TYPE, TRANSACTION_DATE, TAX_TRANSACTION_ID, PARENT_TRANSACTION_TYPE,
CURRENCY_CONVERSION_RATE, QUANTITY,
DECODE(TRANSACTION_TYPE, 'RECEIVE', 1, 'RETURN TO VENDOR', -1, 'CORRECT',
DECODE(PARENT_TRANSACTION_TYPE, 'RECEIVE', SIGN(QUANTITY), 'RETURN TO VENDOR',
SIGN(QUANTITY)*-1)) QUANTITY_MULTIPLIER
FROM JAI_RCV_TRANSACTIONS
WHERE :B1 IS NULL AND :B2 IS NULL
AND ( TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') OR ( TRANSACTION_TYPE = 'CORRECT' AND PARENT_TRANSACTION_TYPE IN ('RECEIVE', 'RETURN TO VENDOR') ) ) )
ORDER BY TRANSACTION_ID;
The benefit of this modified query is :<
- If both BINDS are NOT NULL, use a Composite Unique Index on SHIPMENT_LINE_ID, TRANSACTION_ID
- If BIND :B1 on TRANSACTION_ID is NULL then a Range Scan on SHIPMENT_LINE_ID
- If Bind :B2 on SHIPMENT_LINE_ID is NULL then a Unique Scan on TRANSACTION_ID
- If Both Binds are NULL, then a Full Table Scan (this is unavoidable).