Unacceptable Performance post Implementing Exadata….

UPDATED ON 17th APRIL 2012.

Tanel Poder, in one of his blog, has demonstrated that the Queries on Sequences are Offloadable. This clearly means, my conclusion was wrong and needs investigation. I had a 10046 trace during the problem time, which says the waits on “cell multiblock physical read”, which means, the query was not offloaded. Post commenting the Sequences, the query response time came down to 3-4 minutes and therefore, I concluded this to be an issue. The Customer also mentioned that this time, he could see “cell smart table scan” and this lead to the wrong conclusion. The learning from this is : Until and Unless, you physically see the impact of the change and are 100% assured, never conclude.

One of the primary reason, I started blogging is to share my analysis. All these are my own views and are not supported or assisted by anyone. Therefore, there are times, when I am wrong. Blogging helps me connect the experts worldwide. Their comments on my writing (be it a positive or negative) serves my purpose of self learning. I appreciate the comments where I am wrong, as it helps me rectifying my mistakes. The latest blog is one such example, which Tanel has helped me understand that my analysis was wrong. I appreciate his comments and the demonstration. Click on the link below to access the blog written by Tanel.

http://blog.tanelpoder.com/2012/04/13/exadata-smart-scan-predicate-offloading-and-sequence-nextval/

As mentioned in one of my previous blog, Exadata has changed the Performance perception and is therefore has generated widespread interest amongst Oracle Community. To be very honest, due to the cost involved in investing Exadata, everyone expects it to deliver more than what is promised. Again, honestly, I have seen a tremendous performance improvement, that too AS-IS, which means, a significant improvement without actually making any changes and exadata features in action. During one of my POC for a banking customer, we just restored the Customer Database onto an Exadata Machine and observed significant improvement without actually making any changes or tweaking it for Exadata. Back to my current topic, as mentioned, Customer expects a significant improvement post implementing Exadata and this blog is on an issue that bothered one of my customer a lot.

The background of the issue is : A Customer supporting a large application invested in an Exadata Machine, with a business justification of providing an SLA for critical business processes. Some of these processes have been violating the SLA’s and therefore, with all the options of Optimization faded, the change was important. During the POC phase, all but one process showed significant improvement. One process took longer than expected and was a threat to the business justification provided before investing.

About the Process

It’s a Data Loading process that creates a new table from another table based on some conditions. This newly created table is then used by other processes. On a Non-Exadata Machine, the table creation takes around 3 hours. Customer expected this to come down to less than 10 minutes and was shocked when it took more than 2 hours. The completion time on Exadata is not known, as the customer (due to sheer frustration) killed the process. Fortunately (atleast for me), when the process did not complete, the administrator enabled 10046 trace for the process and, in this blog, I will share a critical information from the trace that helped optimizing the process.

Before I dive into the issue and the resolution, let us understand a critical aspect of Exadata. A Query (FTS) executed against Exadata Machine, the required blocks can be read from the storage in two different approaches. These are : (1) either read the required columns (column projection) or the required rows (predicate filtering) from the Storage using iDB and/or (2) read the entire block (and finally all the blocks) into the SGA. The former approach is called as Smart Scan (or Query Offloading), whereas, the later is similar to the strategy used by Non-Exadata Oracle Implementation.

Back to the issue : Involvement, post-performance issues, requires analyzing traces and in our case, it was 10046 trace file. From the trace file, I could see that the Query was doing a Full Table Scan for loading the data into another table. Further, the prominent waits associated with this Query were “cell multiblock physical read” and “cell single block physical read”. From these waits, it was evident that the Query was not offloaded to the Exadata Storage and therefore, the performance of this query would not benefit from Exadata Storage. As mentioned in my previous paragraph, if the query is not offloaded to the Exadata Storage, the behavior and performance would be nearly similar to that of non-exadata storage. Please note, this is a Full Table Scan and therefore, flash cache will not be of any use here, unless, the table in the query is specifically cached using KEEP clause.

The Query that creates a table is mentioned below and you need to believe me that it does a Full Table Scan on ST Table.

create table D_ME1 as  
select 	ST.SRC_SYS_REF_ID SRC_SYS_REF_ID, ST.XCD_SRC_ENTY_REF_ID SRC_ENTY_REF_ID,  
	ST.XCD_SRC_ID SRC_SYS_ID,  
	(select REF.REF_ID from test.reference_v REF where REF.REF_TYP_NAME='OGRDS ENTITY TYPE' and REF.REF_VAL = 'CROSSCODE') OGRDS_ENTY_REF_ID,
	'XCD' OGRDS_ENTY_NAME,  
	XCD_ID_SEQ.NEXTVAL OGRDS_ID,  
	'PNB_MDL_LOAD' CRTD_BY,  
	sys_extract_utc(systimestamp) CRTD_DTTM,  
	'PNB_MDL_LOAD' UPD_BY,  
	sys_extract_utc(systimestamp) UPD_DTTM  
from 	P2_MDL_E2E.ST ST  
where  	(1=1)    
And 	(ST.SRC_SYS_REF_ID=1)    
AND 	('Y')=(SELECT INTF_A.RUN_IND FROM p2_mdl_ctrl_e2e.CT_LST INTF_A
		WHERE 	INTF_A.INTF_NAME='PNB_MDL_INT_XREF_XCD_B' 
		AND 	INTF_A.SRC_SYS_REF_ID=1
              	AND 	INTF_A.SCEN_NAME='PNB_MDL_PKG_XREF_XCD_B' 
              	AND	INTF_A.FLOW_NAME='EIMDB_INITIAL_LOAD');              

A simple change in the query improved the performance and the execution time was 3-4 minutes. This was primarily due to Query Offloading, which forces the processing of the Query at the Storage level and sent only the relevant data to the Database. This is the first approach mentioned above. What was the change that triggered the important Optimization of Offloading ? In other words, What prevented the Original Query to be Offloaded to the Exadata Storage ? The answer is the XCD_ID_SEQ.NEXTVAL. With Oracle Sequences, the database engine has to generate and maintain the values in the SGA. Oracle Sequences cannot be offloaded and therefore, if these are specified in the query, the optimization fallsback to the traditional method of sending each block to the SGA. For testing purpose, when the Sequence was replaced by rownum (and as a final workaround to to_char(sysdate,’DDMMYYYYHH24MISS’)||rownum) offloading came into effect and the execution time dropped to 3-4 minutes. The fresh 10046 traces also revealed that the wait events for the modified query changed to “cell smart table scan” which means a Smart Scan. The final modified query is as under :

create table D_ME1 as  
select 	ST.SRC_SYS_REF_ID SRC_SYS_REF_ID, ST.XCD_SRC_ENTY_REF_ID SRC_ENTY_REF_ID,  
	ST.XCD_SRC_ID SRC_SYS_ID,  
	(select REF.REF_ID from test.reference_v REF where REF.REF_TYP_NAME='OGRDS ENTITY TYPE' and REF.REF_VAL = 'CROSSCODE') OGRDS_ENTY_REF_ID,
	'XCD' OGRDS_ENTY_NAME,  
	to_char(sysdate,'DDMMYYYYHH24MISS')||rownum OGRDS_ID,  /* this creates a Unique value forcing Offloading */
	'PNB_MDL_LOAD' CRTD_BY,  
	sys_extract_utc(systimestamp) CRTD_DTTM,  
	'PNB_MDL_LOAD' UPD_BY,  
	sys_extract_utc(systimestamp) UPD_DTTM  
from 	P2_MDL_E2E.ST ST  
where  	(1=1)    
And 	(ST.SRC_SYS_REF_ID=1)    
AND 	('Y')=(SELECT INTF_A.RUN_IND FROM p2_mdl_ctrl_e2e.CT_LST INTF_A
		WHERE 	INTF_A.INTF_NAME='PNB_MDL_INT_XREF_XCD_B' 
		AND 	INTF_A.SRC_SYS_REF_ID=1
              	AND 	INTF_A.SCEN_NAME='PNB_MDL_PKG_XREF_XCD_B' 
              	AND	INTF_A.FLOW_NAME='EIMDB_INITIAL_LOAD');              

In a nutshell, while Exadata will give you tremendous performance, especially in case of a Full Table Scan, there are cases that require human intelligence :). In this case, while both the queries perform a Full Table Scan, the Original prevented Optimization by way of Offloading (Smart Scan), while the modified code smartly offloaded the query to the Storage bringing down the execution time drastically.

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

18 Responses to Unacceptable Performance post Implementing Exadata….

  1. Bhavik Desai says:

    Good to know this Vivek. Thanks for sharing. Not yet encountered this in my Exadata system. Will take it for future reference !

    Like

  2. savvinov says:

    Hi Vivek,

    thanks for the post. I haven’t yet gotten a chance to work with an Exadata Machine, is there any virtualization or a simulation available to get some hands-on experience?

    Best regards,
    Nikolay

    Like

    • Vivek says:

      Thanks Nikolay. Unfortunately, the answer to your question is NO. You may not be able to see the actual performance difference that Exadata brings in unless you implement and run your application against Exadata. However, you can simulate the Exadata Performance on your traditional storage, which will only show you the statistics on how the queries would behave on Exadata. This means, the Interconnect I/O that would be saved post implementing Exadata. I have a blog on this, which would be helpful.

      Regards
      Vivek

      Like

  3. sandeep says:

    Hi Vivek,
    Another example about tremendous performance of exadata smart scan, thanx for sharing your experience 🙂

    Like

  4. Ranjit Nagi says:

    Hi Vivek,

    Thanks for sharing this, so for sequence Exadata need sga to retain values. Any other type of events that need buffer cache processing specifically?

    Regards
    Ranjit

    Like

    • Vivek says:

      Dear Ranjit,

      There are other functions that prevent offloading and can be checked in v$sqlfn_metadata view.

      Regards
      Vivek

      Like

      • Note that it’s not an “all or nothing” thing for the whole SQL or even a table query.

        If you have 2 predicates on a table, one of them is offloadable and the other isn’t, then the offloadable one still gets offloaded if smart scan kicks in and the other one will just remain as a regular filter predicate only.

        Note that while I don’t agree with the conclusion if this particular test case, I definitely agree with “there are cases that require human intelligence” part! Sometimes a little tweaking is needed to unleash the full potential of Exadata.

        Like

  5. Hi Vivek,

    Something else must have gone wrong in your tests, as the smart scan offloading applies to some data retrieval row sources / filtering only and nothing else. So, what you have in your projection list for example (the sequence use), this does not directly affect the smart scan decision. In other words, sequence.NEXTVAL can be used in a query and you can still get smart scans.

    Here’s a little narrowed-down example which proves this (I hope it formats correctly):

    
    SQL Monitoring Report
    
    SQL Text
    ------------------------------
    SELECT /*+ MONITOR */ t.object_name, s.NEXTVAL n FROM t WHERE owner LIKE 'XYZ%'
    
    Global Information
    ------------------------------
     Status              :  DONE (ALL ROWS)
     Instance ID         :  1
     Session             :  TANEL (204:2707)
     SQL ID              :  8tdb73z1hpcdq
     SQL Execution ID    :  16777216
     Execution Started   :  04/13/2012 08:45:58
     First Refresh Time  :  04/13/2012 08:45:58
     Last Refresh Time   :  04/13/2012 08:46:00
     Duration            :  2s
     Module/Action       :  SQL*Plus/-
     Service             :  dbm
     Program             :  sqlplus@mac02.local (TNS V1-V3)
     Fetch Calls         :  1
    
    Global Stats
    ==========================================================================
    | Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |  Cell   |
    | Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes | Offload |
    ==========================================================================
    |    1.46 |    0.09 |     1.37 |     1 |   317K | 2496 |   2GB |  99.99% |
    ==========================================================================
    
    SQL Plan Monitoring Details (Plan Hash Value=1589210007)
    ======================================================================================================================================================================
    | Id |          Operation           | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   | Activity |      Activity Detail      |
    |    |                              |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload |   (%)    |        (# samples)        |
    ======================================================================================================================================================================
    |  0 | SELECT STATEMENT             |      |         |       |           |        |     1 |          |      |       |         |          |                           |
    |  1 |   SEQUENCE                   | S    |         |       |           |        |     1 |          |      |       |         |          |                           |
    |  2 |    TABLE ACCESS STORAGE FULL | T    |    439K | 20738 |         1 |     +1 |     1 |        0 | 2496 |   2GB |  99.99% |   100.00 | cell smart table scan (1) |
    ======================================================================================================================================================================
    
    

    See, you have the “cell smart table scan” wait event for the data access (and also the offloading % is reported for this row source), so smart scan was used.

    There are plenty of other reasons why smart scans may not kick in for your full scans, one main cause is that if this segment happens to be cached well enough when the full scan starts, Oracle may decide to run it as a buffered scan instead.

    Like

    • Vivek says:

      Hi Tanel,

      Appreciate your comments and a test case to demonstrate the offloading of queries using sequence. I could simulate this as well. I will revisit the analysis and investigate, the cause of the huge difference of 2 hours to 3-4 minutes. Blogging really helps, as the knowledge that you share is read by others and, in case of any wrong conclusion, it is immediately rectified. I

      Regards
      Vivek

      Like

      • Tanel Poder says:

        Absolutely, blogging rocks, first because it requires quite a lot of thinking to put the findings into writing in a short and concise way (it’s much easier to just read stuff and forget) and then there are plenty of people who’d do instant “peer review” to the writing…

        Like

  6. Greg Rahn says:

    Tanel has demonstrated that using sequences works fine with smart scan. I believe you have come to the wrong conclusion.

    http://blog.tanelpoder.com/2012/04/13/exadata-smart-scan-predicate-offloading-and-sequence-nextval/

    Like

    • Vivek says:

      Thanks Greg for the comments. Yes, it is a wrong conclusion and therefore, needs investigation. I will update the blog with this wrong conclusion and further plan of action. This same post was sent to me by another reader Andrew Lo. I appreciate the comments from you and Tanel on this blog.

      Regards
      Vivek

      Like

  7. Yogi says:

    Vivek,

    Recently, I read a post by Tanel Podar, where he says sequences shouldnt be the reason why smart scan are not used in exadata.

    http://blog.tanelpoder.com/2012/04/13/exadata-smart-scan-predicate-offloading-and-sequence-nextval/?utm_source=rss&utm_medium=rss&utm_campaign=exadata-smart-scan-predicate-offloading-and-sequence-nextval

    Or is it just me, who’s trying to compare apples and oranges? Maybe both senarios are different.

    -Yogi

    Like

    • Vivek says:

      Hi Yogesh,

      Yes, I have gone through the blog and am really sorry for the wrong conclusion on the blog. However, I appreciate the comments from all my readers, as it helps clarifying any wrong doings 🙂

      Regards
      Vivek

      Like

  8. porushh says:

    Vivek, customer is recommended to use RAT (Real App Testing) to capture the production db workload and replay in the test Exadata database. Problem statement would have been highlighed as improved but not having a lot of improvement. SQL Monitoring (part of the Diagnostics pack of Enterprise Manager) of the long running statement would have shown that cell smart table scan was not being used. Your invaluable tip would have then helped to resolve the issue. Thanks for highlighting this.

    Like

  9. Andreea says:

    Hello! I ithink we have something helpful for you. Knowing there are a lot of question marks related to Exadata implementation and getting the performance expected out of it, Oracle University invites you to a free mini-LVC on Friday – December 7th, where Operational Impact of Deploying an Engineered System (Exadata) will be discussed. You can register here: http://www.oracle.com/webapps/events/ns/EventsDetail.jsp?p_eventId=161217&src=7547030&src=7547030&Act=58.

    Like

  10. Dear Vivek,

    Thanks for the nice post and great comments from all readers.
    Just to confirm my understanding:
    Offloading can occur while using sequences, as indicated by Tanel, but in the example
    you used, sequence was the main reason for preventing offloading, because after you
    replace it with rownum, offloading happened. Is this correct?

    Ahmed

    Like

    • Vivek Sharma says:

      Hi Ahmed,

      In this case, removing sequences resolved the slow performance. This was corrected by Tanel. Post this understanding, I worked on a similar optimization, wherein, a loading performance was slower using sequences and performed better without sequences. This is nothing to do with Exadata and is an expected behaviour. Sequences require dictionary access every time the sequence is exhausted, which is then re populated. This is slower. Rownum generates the number faster, thus better in terms of performance.

      Hope this helps.

      Regards
      Vivek

      Like

Leave a comment