Importance of Constraints ! Know your Reporting Tools

Recently, I was working on a customer issue. The issue was a performance comparison between a Competition and an Oracle Database. The performance of Competition was reported to be better than Oracle. Now, this is a classic case of Bad Schema Design and a Badly Written Query. Working on this issue reminded me of a great learning that I had after reading “Expert Oracle Database Architecture” by Thomas Kyte. He wrote about a classic issue with a Pl/SQL Block running in SQL Server and generating wrong results when ported to Oracle Database due to the way these 2 databases compare NULL values. Each of these databases are different. It also reminded me of one of my response to a query from a customer on “A count(*) from a Table is doing a Full Table Scan, even though it has a Unique Index on it”.

As Tom mentioned in his book, every database is different and implement the features differently. Now, the third party tools that connect to each of these different data sources generate queries that syntactically work on all but may not run optimally. Therefore, it is important to understand our Data and design the Schema with all the required constraints and indexes in place.

In this case, customer was running few analytical reports from a Third Party Reporting Tool (Tableau). Customer selected few columns with couple of predicates with Date Range and some product name. The Queries were around 40-50 Lines. Surprisingly, for each of the Predicates, Tableau added few additional predicates and due to these additional predicates, the run time plan was around 2700+ lines long. As an example, I have a table T1 and have to generate a report selecting object id’s and names for any of the 2 TEMPORARY values (Y or N). My query will look like :

select object_id, object_name from t1 where temporary=:b1;

If I have an Index on TEMPORARY Column, Optimizer will come out with an optimal plan based on it’s cost calculation. However, when run from Tableau, it came out with the following query:

select object_id, object_name from t1
where ((temporary=:b1) or (temporary is NULL and :b1 is NULL));

What will be the implication of this added OR conditions ? Let’s see.

SQL> create table t1 as
select * from all_objects;  2

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'T1', method_opt=>'for all columns size auto for columns temporary size 100');

PL/SQL procedure successfully completed.

SQL> create index t1_idx on t1(temporary);

Index created.

SQL> select num_rows, blocks from dba_tables where table_name='T1';

  NUM_ROWS     BLOCKS
---------- ----------
     68417       1375

SQL> variable l_temporary varchar2(1);
SQL> exec :l_temporary:='Y';

PL/SQL procedure successfully completed.

SQL> select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL)); 

161 rows selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  7xsahyu4q594y, child number 0
-------------------------------------
select object_id, object_name from t1 where ((temporary=:l_temporary)
or (temporary is NULL and :l_temporary is NULL))

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   375 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   161 |  6923 |   375   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEMPORARY"=:L_TEMPORARY OR ("TEMPORARY" IS NULL AND
              :L_TEMPORARY IS NULL)))

161 Rows out of 68000, which is less than 1%. Index Access would have been a better option. At customer end, the table was huge with Billions of Rows and with multiple such OR Predicates, the very first observation was the amount of time the Optimizer took to Parse the Query. The Query took around 184 Seconds to run and the observation was that out of 184 Seconds, around 175 Seconds were spent on Parsing. This was identified as a BUG in 18c (BUG#28725660) and the primary cause identified as the change in OR Expansion behaviour in 18c. This BUG is fixed in 19c. Backporting it to 18c would have taken some time, so the other fix that we applied was to add NOT NULL Constraints to some of the columns. From the Data, we could see that none of the columns had NULL values. We checked with the developers and they mentioned that NULL values are not stored in this column. Therefore, it was safe to add these constraints. Continuing with our example above, let’s add a NOT NULL constraint to our Table T1.

SQL> alter table t1 modify temporary not null;

Table altered.

select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL));

select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |       |     5 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |   161 |  6923 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |   161 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"=:L_TEMPORARY)

The run time plan of the same query is now an Index Scan and is much better than previous. This additional input to the Optimizer was enough to transform the query. You can generate 10053 trace to see the transformation.

NOT NULL constraint combined with Unique Index is required to answer a COUNT(*) query from an Index, which one of my customer asked long time back. This is another case, where NOT NULL constraint helped optimizer come up with an optimal plan.

It is important to optimize the Schema and provide all the critical inputs to the Optimizer. Generic reporting tools would come out with queries that work on all the databases. These generated queries may work on some database and may not work on other. Therefore, the title of this blog “Know your Reporting Tool” :).

On the real customer issue, post adding the constraints to few of the columns, the query response time reduced drastically from 184 Seconds to less than 10 seconds far better than the competition.

Effective SQL: 61 specific ways of writing effective SQL

Few months back, I completed a technical review of an upcoming book titled “Effective SQL: 61 specific ways of writing effective SQL”. This book focuses on widely used Relational Databases and, no doubt, Oracle is one of these. My review was specifically on Oracle SQL. Got to know from the publisher that this book is scheduled to be out in the market by Friday, 23rd December 2016. The book can be ordered from the following link:

effective-sql-61-specific-ways-to-write-better-sql-9780134578897

 effectivesql

SQL Optimization ! Yet another example ……

Edited on 30th October 2014, as the Query recommendation had a flaw, which I could test during my preparation for an Event.

This is another classic example where the Developers needs to know a better way of writing a code. I wrote about one such example in my earlier post. There are many ways of achieving a target, but very few to achieve this in an efficient manner.

I came across a performance issue, which triggered acute slowness during peak load. This slowness was evident post increase in data volume. A Query, which was doing around 28k Logical I/O’s was not doing around 48K per execution. The concurrency on this query was so high that in a 30 minutes windows (as seen from the AWR reports), the total number of logical I/O’s, cummulative for all executions, was around 1 Billion. In terms of concurrency, I could see that out of 15 Active sessions, 10 sessions were executing this query concurrently, thus consuming high CPU. I will not post the Original query but will demonstrate an example.

The example is from SH Schema on Customers Table. I have created following Indexes :

create index customers_cust_city_idx on customers(cust_city);
create index customers_cust_state_idx on customers(cust_state_province);

The application requirement is to design a screen for users to retrieve Cust_ID and cust_last_name based on queried cust_city and/or cust_state_province. The user has an option to either query on both or either of theses, with no option of leaving both NULL. To get this done, the developers coded the query as mentioned below. I have also pasted the run time execution plan.

## With CUST_CITY as NULL
variable b1 varchar2(32);
variable b2 varchar2(32);
exec :b1:='Maharashtra';
exec :b2:=null;

select cust_id, cust_last_name
from 	customers
where	(cust_city=:b2 or :b2 is null)
and	(cust_state_province=:b1 or :b1 is null);

SQL_ID  554u5htwuan4z, child number 0
-------------------------------------
select cust_id, cust_last_name from  customers where (cust_city=:b2 or
:b2 is null) and (cust_state_province=:b1 or :b1 is null)

Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |   406 (100)|          |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   162 |  5508 |   406   (1)| 00:00:05 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(((:B2 IS NULL OR "CUST_CITY"=:B2) AND (:B1 IS NULL OR
              "CUST_STATE_PROVINCE"=:B1)))

## With CUST_STATE_PROVINCE as NULL

exec :b1:=null;
exec :b2:='Mumbai Bombay';

SQL_ID  554u5htwuan4z, child number 0
-------------------------------------
select cust_id, cust_last_name from  customers where (cust_city=:b2 or
:b2 is null) and (cust_state_province=:b1 or :b1 is null)

Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |   406 (100)|          |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   162 |  5508 |   406   (1)| 00:00:05 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(((:B2 IS NULL OR "CUST_CITY"=:B2) AND (:B1 IS NULL OR
              "CUST_STATE_PROVINCE"=:B1)))

## With both NON NULL Values

exec :b1:='Maharashtra';
exec :b2:='Mumbai Bombay';

SQL_ID  554u5htwuan4z, child number 0
-------------------------------------
select cust_id, cust_last_name from  customers where (cust_city=:b2 or
:b2 is null) and (cust_state_province=:b1 or :b1 is null)

Plan hash value: 2008213504

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |       |       |   406 (100)|          |
|*  1 |  TABLE ACCESS FULL| CUSTOMERS |   162 |  5508 |   406   (1)| 00:00:05 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(((:B2 IS NULL OR "CUST_CITY"=:B2) AND (:B1 IS NULL OR
              "CUST_STATE_PROVINCE"=:B1)))

For each of these executions, the plan was a Full Table Scan of CUSTOMERS Table. Even if the values for both the bind variables were provided, the query optimizer selected a Full Scan and this is because of the way this query is written. A Full Table Scan will impact the performance and as the volume increases, the impact will gradually increase. One of the better way for writing this query would be to have IF ELSE condition, as I have demonstrated in my previous posts. There is another way as well. See below :


variable b1 varchar2(32);
variable b2 varchar2(32);
exec :b1:='Maharashtra';
exec :b2:=null;

select cust_id, cust_last_name, cust_city, cust_state_province
from 	customers
where	cust_state_province=:b1
and	:b2 is null
union 
select cust_id, cust_last_name, cust_city, cust_state_province
from 	customers
where	cust_city=:b2
and	:b1 is null
union
select cust_id, cust_last_name, cust_city, cust_state_province
from 	customers
where	cust_city=:b2
and	cust_state_province=:b1;

SQL_ID  946qyhrzz882s, child number 0
-------------------------------------
select cust_id, cust_last_name, cust_city, cust_state_province from
customers where cust_state_province=:b1 and :b2 is null union select
cust_id, cust_last_name, cust_city, cust_state_province from  customers
where cust_city=:b2 and :b1 is null union select cust_id,
cust_last_name, cust_city, cust_state_province from  customers where
cust_city=:b2 and cust_state_province=:b1

Plan hash value: 3095663828

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                          |       |       |   217 (100)|          |
|   1 |  SORT UNIQUE                       |                          |   280 |  9520 |   217  (42)| 00:00:05 |
|   2 |   UNION-ALL                        |                          |       |       |            |          |
|*  3 |    FILTER                          |                          |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID    | CUSTOMERS                |   189 |  6426 |   127   (0)| 00:00:03 |
|*  5 |      INDEX RANGE SCAN              | CUSTOMERS_CUST_STATE_IDX |   189 |       |     1   (0)| 00:00:01 |
|*  6 |    FILTER                          |                          |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID    | CUSTOMERS                |    90 |  3060 |    85   (0)| 00:00:02 |
|*  8 |      INDEX RANGE SCAN              | CUSTOMERS_CUST_CITY_IDX  |    90 |       |     1   (0)| 00:00:01 |
|   9 |    TABLE ACCESS BY INDEX ROWID     | CUSTOMERS                |     1 |    34 |     2   (0)| 00:00:01 |
|  10 |     BITMAP CONVERSION TO ROWIDS    |                          |       |       |            |          |
|  11 |      BITMAP AND                    |                          |       |       |            |          |
|  12 |       BITMAP CONVERSION FROM ROWIDS|                          |       |       |            |          |
|* 13 |        INDEX RANGE SCAN            | CUSTOMERS_CUST_CITY_IDX  |    90 |       |     1   (0)| 00:00:01 |
|  14 |       BITMAP CONVERSION FROM ROWIDS|                          |       |       |            |          |
|* 15 |        INDEX RANGE SCAN            | CUSTOMERS_CUST_STATE_IDX |    90 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(:B2 IS NULL)
   5 - access("CUST_STATE_PROVINCE"=:B1)
   6 - filter(:B1 IS NULL)
   8 - access("CUST_CITY"=:B2)
  13 - access("CUST_CITY"=:B2)
  15 - access("CUST_STATE_PROVINCE"=:B1)

The plan changed from Full Table Scans to Index Scans. Another alternative would be to rewrite the code, using IF ELSE, writing three different queries depending upon the user entries for :b1 and :b2. This may mean, a bit of lengthy code.

%d bloggers like this: