All India Oracle User Group Events for 2018

Sangam 2018 is scheduled for 7th and 8th December 2018. I am presenting a session on “SQL Tuning ! Tips, Tools and Techniques” which is on 7th December 2018 and is immediately after Lunch. A 45 minute session doesn’t justify the topic, but will try my level best to cover few interesting real life examples. On 8th December, I will be co-hosting a Fire side chat on Performance along side Tirthankar Lahiri (VP of Product Management, Oracle US), Arup Nanda, GP Gongloor and Karan Dodwal. This is scheduled for 2.55 pm. The only issue is that I may have to leave early to catch my flight back to Mumbai.

North India Chapter – Chandigarh. Mark your calendar for 24th November 2018 as I will be speaking for the entire day on Performance Optimization covering some interesting topics around SQL Optimization, Optimizer, Indexes, Autonomous Databases and many more. This is a Full day event which gives me good amount of time to demonstrate some of the interesting facts. Registration for this event is open. Use the following link to register. See you soon.

https://www.meraevents.com/us/previewevent?view=preview&eventId=183767

For 2019, will publish the calendar, once it is freezed 🙂

Importance of Constraints ! Know your Reporting Tools

Recently, I was working on a customer issue. The issue was a performance comparison between a Competition and an Oracle Database. The performance of Competition was reported to be better than Oracle. Now, this is a classic case of Bad Schema Design and a Badly Written Query. Working on this issue reminded me of a great learning that I had after reading “Expert Oracle Database Architecture” by Thomas Kyte. He wrote about a classic issue with a Pl/SQL Block running in SQL Server and generating wrong results when ported to Oracle Database due to the way these 2 databases compare NULL values. Each of these databases are different. It also reminded me of one of my response to a query from a customer on “A count(*) from a Table is doing a Full Table Scan, even though it has a Unique Index on it”.

As Tom mentioned in his book, every database is different and implement the features differently. Now, the third party tools that connect to each of these different data sources generate queries that syntactically work on all but may not run optimally. Therefore, it is important to understand our Data and design the Schema with all the required constraints and indexes in place.

In this case, customer was running few analytical reports from a Third Party Reporting Tool (Tableau). Customer selected few columns with couple of predicates with Date Range and some product name. The Queries were around 40-50 Lines. Surprisingly, for each of the Predicates, Tableau added few additional predicates and due to these additional predicates, the run time plan was around 2700+ lines long. As an example, I have a table T1 and have to generate a report selecting object id’s and names for any of the 2 TEMPORARY values (Y or N). My query will look like :

select object_id, object_name from t1 where temporary=:b1;

If I have an Index on TEMPORARY Column, Optimizer will come out with an optimal plan based on it’s cost calculation. However, when run from Tableau, it came out with the following query:

select object_id, object_name from t1
where ((temporary=:b1) or (temporary is NULL and :b1 is NULL));

What will be the implication of this added OR conditions ? Let’s see.

SQL> create table t1 as
select * from all_objects;  2

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'T1', method_opt=>'for all columns size auto for columns temporary size 100');

PL/SQL procedure successfully completed.

SQL> create index t1_idx on t1(temporary);

Index created.

SQL> select num_rows, blocks from dba_tables where table_name='T1';

  NUM_ROWS     BLOCKS
---------- ----------
     68417       1375

SQL> variable l_temporary varchar2(1);
SQL> exec :l_temporary:='Y';

PL/SQL procedure successfully completed.

SQL> select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL)); 

161 rows selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  7xsahyu4q594y, child number 0
-------------------------------------
select object_id, object_name from t1 where ((temporary=:l_temporary)
or (temporary is NULL and :l_temporary is NULL))

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   375 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   161 |  6923 |   375   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEMPORARY"=:L_TEMPORARY OR ("TEMPORARY" IS NULL AND
              :L_TEMPORARY IS NULL)))

161 Rows out of 68000, which is less than 1%. Index Access would have been a better option. At customer end, the table was huge with Billions of Rows and with multiple such OR Predicates, the very first observation was the amount of time the Optimizer took to Parse the Query. The Query took around 184 Seconds to run and the observation was that out of 184 Seconds, around 175 Seconds were spent on Parsing. This was identified as a BUG in 18c (BUG#28725660) and the primary cause identified as the change in OR Expansion behaviour in 18c. This BUG is fixed in 19c. Backporting it to 18c would have taken some time, so the other fix that we applied was to add NOT NULL Constraints to some of the columns. From the Data, we could see that none of the columns had NULL values. We checked with the developers and they mentioned that NULL values are not stored in this column. Therefore, it was safe to add these constraints. Continuing with our example above, let’s add a NOT NULL constraint to our Table T1.

SQL> alter table t1 modify temporary not null;

Table altered.

select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL));

select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |       |     5 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |   161 |  6923 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |   161 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"=:L_TEMPORARY)

The run time plan of the same query is now an Index Scan and is much better than previous. This additional input to the Optimizer was enough to transform the query. You can generate 10053 trace to see the transformation.

NOT NULL constraint combined with Unique Index is required to answer a COUNT(*) query from an Index, which one of my customer asked long time back. This is another case, where NOT NULL constraint helped optimizer come up with an optimal plan.

It is important to optimize the Schema and provide all the critical inputs to the Optimizer. Generic reporting tools would come out with queries that work on all the databases. These generated queries may work on some database and may not work on other. Therefore, the title of this blog “Know your Reporting Tool” :).

On the real customer issue, post adding the constraints to few of the columns, the query response time reduced drastically from 184 Seconds to less than 10 seconds far better than the competition.

%d bloggers like this: