Unacceptable Performance post Implementing Exadata….
April 12, 2012 18 Comments
UPDATED ON 17th APRIL 2012.
Tanel Poder, in one of his blog, has demonstrated that the Queries on Sequences are Offloadable. This clearly means, my conclusion was wrong and needs investigation. I had a 10046 trace during the problem time, which says the waits on “cell multiblock physical read”, which means, the query was not offloaded. Post commenting the Sequences, the query response time came down to 3-4 minutes and therefore, I concluded this to be an issue. The Customer also mentioned that this time, he could see “cell smart table scan” and this lead to the wrong conclusion. The learning from this is : Until and Unless, you physically see the impact of the change and are 100% assured, never conclude.
One of the primary reason, I started blogging is to share my analysis. All these are my own views and are not supported or assisted by anyone. Therefore, there are times, when I am wrong. Blogging helps me connect the experts worldwide. Their comments on my writing (be it a positive or negative) serves my purpose of self learning. I appreciate the comments where I am wrong, as it helps me rectifying my mistakes. The latest blog is one such example, which Tanel has helped me understand that my analysis was wrong. I appreciate his comments and the demonstration. Click on the link below to access the blog written by Tanel.
As mentioned in one of my previous blog, Exadata has changed the Performance perception and is therefore has generated widespread interest amongst Oracle Community. To be very honest, due to the cost involved in investing Exadata, everyone expects it to deliver more than what is promised. Again, honestly, I have seen a tremendous performance improvement, that too AS-IS, which means, a significant improvement without actually making any changes and exadata features in action. During one of my POC for a banking customer, we just restored the Customer Database onto an Exadata Machine and observed significant improvement without actually making any changes or tweaking it for Exadata. Back to my current topic, as mentioned, Customer expects a significant improvement post implementing Exadata and this blog is on an issue that bothered one of my customer a lot.
The background of the issue is : A Customer supporting a large application invested in an Exadata Machine, with a business justification of providing an SLA for critical business processes. Some of these processes have been violating the SLA’s and therefore, with all the options of Optimization faded, the change was important. During the POC phase, all but one process showed significant improvement. One process took longer than expected and was a threat to the business justification provided before investing.
About the Process
It’s a Data Loading process that creates a new table from another table based on some conditions. This newly created table is then used by other processes. On a Non-Exadata Machine, the table creation takes around 3 hours. Customer expected this to come down to less than 10 minutes and was shocked when it took more than 2 hours. The completion time on Exadata is not known, as the customer (due to sheer frustration) killed the process. Fortunately (atleast for me), when the process did not complete, the administrator enabled 10046 trace for the process and, in this blog, I will share a critical information from the trace that helped optimizing the process.
Before I dive into the issue and the resolution, let us understand a critical aspect of Exadata. A Query (FTS) executed against Exadata Machine, the required blocks can be read from the storage in two different approaches. These are : (1) either read the required columns (column projection) or the required rows (predicate filtering) from the Storage using iDB and/or (2) read the entire block (and finally all the blocks) into the SGA. The former approach is called as Smart Scan (or Query Offloading), whereas, the later is similar to the strategy used by Non-Exadata Oracle Implementation.
Back to the issue : Involvement, post-performance issues, requires analyzing traces and in our case, it was 10046 trace file. From the trace file, I could see that the Query was doing a Full Table Scan for loading the data into another table. Further, the prominent waits associated with this Query were “cell multiblock physical read” and “cell single block physical read”. From these waits, it was evident that the Query was not offloaded to the Exadata Storage and therefore, the performance of this query would not benefit from Exadata Storage. As mentioned in my previous paragraph, if the query is not offloaded to the Exadata Storage, the behavior and performance would be nearly similar to that of non-exadata storage. Please note, this is a Full Table Scan and therefore, flash cache will not be of any use here, unless, the table in the query is specifically cached using KEEP clause.
The Query that creates a table is mentioned below and you need to believe me that it does a Full Table Scan on ST Table.
create table D_ME1 as select ST.SRC_SYS_REF_ID SRC_SYS_REF_ID, ST.XCD_SRC_ENTY_REF_ID SRC_ENTY_REF_ID, ST.XCD_SRC_ID SRC_SYS_ID, (select REF.REF_ID from test.reference_v REF where REF.REF_TYP_NAME='OGRDS ENTITY TYPE' and REF.REF_VAL = 'CROSSCODE') OGRDS_ENTY_REF_ID, 'XCD' OGRDS_ENTY_NAME, XCD_ID_SEQ.NEXTVAL OGRDS_ID, 'PNB_MDL_LOAD' CRTD_BY, sys_extract_utc(systimestamp) CRTD_DTTM, 'PNB_MDL_LOAD' UPD_BY, sys_extract_utc(systimestamp) UPD_DTTM from P2_MDL_E2E.ST ST where (1=1) And (ST.SRC_SYS_REF_ID=1) AND ('Y')=(SELECT INTF_A.RUN_IND FROM p2_mdl_ctrl_e2e.CT_LST INTF_A WHERE INTF_A.INTF_NAME='PNB_MDL_INT_XREF_XCD_B' AND INTF_A.SRC_SYS_REF_ID=1 AND INTF_A.SCEN_NAME='PNB_MDL_PKG_XREF_XCD_B' AND INTF_A.FLOW_NAME='EIMDB_INITIAL_LOAD');
A simple change in the query improved the performance and the execution time was 3-4 minutes. This was primarily due to Query Offloading, which forces the processing of the Query at the Storage level and sent only the relevant data to the Database. This is the first approach mentioned above. What was the change that triggered the important Optimization of Offloading ? In other words, What prevented the Original Query to be Offloaded to the Exadata Storage ? The answer is the XCD_ID_SEQ.NEXTVAL. With Oracle Sequences, the database engine has to generate and maintain the values in the SGA. Oracle Sequences cannot be offloaded and therefore, if these are specified in the query, the optimization fallsback to the traditional method of sending each block to the SGA. For testing purpose, when the Sequence was replaced by rownum (and as a final workaround to to_char(sysdate,’DDMMYYYYHH24MISS’)||rownum) offloading came into effect and the execution time dropped to 3-4 minutes. The fresh 10046 traces also revealed that the wait events for the modified query changed to “cell smart table scan” which means a Smart Scan. The final modified query is as under :
create table D_ME1 as select ST.SRC_SYS_REF_ID SRC_SYS_REF_ID, ST.XCD_SRC_ENTY_REF_ID SRC_ENTY_REF_ID, ST.XCD_SRC_ID SRC_SYS_ID, (select REF.REF_ID from test.reference_v REF where REF.REF_TYP_NAME='OGRDS ENTITY TYPE' and REF.REF_VAL = 'CROSSCODE') OGRDS_ENTY_REF_ID, 'XCD' OGRDS_ENTY_NAME, to_char(sysdate,'DDMMYYYYHH24MISS')||rownum OGRDS_ID, /* this creates a Unique value forcing Offloading */ 'PNB_MDL_LOAD' CRTD_BY, sys_extract_utc(systimestamp) CRTD_DTTM, 'PNB_MDL_LOAD' UPD_BY, sys_extract_utc(systimestamp) UPD_DTTM from P2_MDL_E2E.ST ST where (1=1) And (ST.SRC_SYS_REF_ID=1) AND ('Y')=(SELECT INTF_A.RUN_IND FROM p2_mdl_ctrl_e2e.CT_LST INTF_A WHERE INTF_A.INTF_NAME='PNB_MDL_INT_XREF_XCD_B' AND INTF_A.SRC_SYS_REF_ID=1 AND INTF_A.SCEN_NAME='PNB_MDL_PKG_XREF_XCD_B' AND INTF_A.FLOW_NAME='EIMDB_INITIAL_LOAD');
In a nutshell, while Exadata will give you tremendous performance, especially in case of a Full Table Scan, there are cases that require human intelligence. In this case, while both the queries perform a Full Table Scan, the Original prevented Optimization by way of Offloading (Smart Scan), while the modified code smartly offloaded the query to the Storage bringing down the execution time drastically.