Auto DOP ! 11g Feature…

Recently, during one of the Performance POC on an Oracle Engineered System, I got to implement Auto DOP feature for one of their critical Batch Job. This Batch job is a a set of multiple processes. Some of the processes run as a Serial Process, some these are just one process, but the query is hinted to use Oracle Parallelism and some of these are parallel application threads.

For AUTO DOP to work, there are two mandatory settings. These are IO Calibration and setting of parallel_degree_policy (which defaults to MANUAL, means no AUTO DOP). I started with AUTO settings as LIMITED requires changing DEGREE for all (or critical huge) tables. AUTO setting means, Optimizer will calculate and come out with the Degree of Parallelism. Further, this setting also enables two features: Parallel_Statement_Queuing and In-Memory Parallel Execution. In-Memory Parallel execution might disable Exadata Smart Scan, therefore, I modified _parallel_cluster_cache_policy to ADAPTIVE i.e no In-Memory Parallel Execution. IO Calibration was done, so that, optimizer can take the advantage of various values (as can be seen from DBA_RSRC_IO_CALIBRATION) to come out with Degree of Parallelism.

My initial test run was not that successful. While investigating this, I came across a very informative article from Gwen Shapira. As per this article, for AUTO DOP considers MAX_PMBPS value only and this can be set manually to 200. In my case, since the calibration process was executed, the value for MAX_PMBPS was around 3000+. As I understand, this value depicts the throughput and therefore, larger the value, lower the DOP will be. I followed the advice in the article and deleted the rows from resource_io_calibrate$.

delete from resource_io_calibrate$;
insert into resource_io_calibrate$ values(current_timestamp, current_timestamp, 0, 0, 200, 0, 0);
commit;

Post setting this value manually, the database needs a restart. This change did the trick. The performance of many of the processes improved drastically. For some of the queries, we could also see Statement Queuing working effectively i.e. waiting for the resources to be available and taking up the required DOP, rather than downgrading the parallel degree. However, the customer was bit apprehensive about AUTO word and therefore, decided to Implement the optimizer calculations of DOP manually into each of these queries and disabling this feature. This change worked as well.

My observations from this nice feature (and I may be wrong as I did not get much time to investigate or work on this), to name a few are :

  • Any Query that takes more than 10 Seconds (Optimizer Estimation) will be subject to Auto DOP
  • A Query with parallel hint will also go through serial execution run time evaluation. This means, if optimizer estimates the runtime of serial execution to be less than 10 seconds, parallel hint will be ignored.
  • In my opinion, AUTO DOP is one of the best innovation.

    Unacceptable Performance post Implementing Exadata….

    UPDATED ON 17th APRIL 2012.

    Tanel Poder, in one of his blog, has demonstrated that the Queries on Sequences are Offloadable. This clearly means, my conclusion was wrong and needs investigation. I had a 10046 trace during the problem time, which says the waits on “cell multiblock physical read”, which means, the query was not offloaded. Post commenting the Sequences, the query response time came down to 3-4 minutes and therefore, I concluded this to be an issue. The Customer also mentioned that this time, he could see “cell smart table scan” and this lead to the wrong conclusion. The learning from this is : Until and Unless, you physically see the impact of the change and are 100% assured, never conclude.

    One of the primary reason, I started blogging is to share my analysis. All these are my own views and are not supported or assisted by anyone. Therefore, there are times, when I am wrong. Blogging helps me connect the experts worldwide. Their comments on my writing (be it a positive or negative) serves my purpose of self learning. I appreciate the comments where I am wrong, as it helps me rectifying my mistakes. The latest blog is one such example, which Tanel has helped me understand that my analysis was wrong. I appreciate his comments and the demonstration. Click on the link below to access the blog written by Tanel.

    http://blog.tanelpoder.com/2012/04/13/exadata-smart-scan-predicate-offloading-and-sequence-nextval/

    As mentioned in one of my previous blog, Exadata has changed the Performance perception and is therefore has generated widespread interest amongst Oracle Community. To be very honest, due to the cost involved in investing Exadata, everyone expects it to deliver more than what is promised. Again, honestly, I have seen a tremendous performance improvement, that too AS-IS, which means, a significant improvement without actually making any changes and exadata features in action. During one of my POC for a banking customer, we just restored the Customer Database onto an Exadata Machine and observed significant improvement without actually making any changes or tweaking it for Exadata. Back to my current topic, as mentioned, Customer expects a significant improvement post implementing Exadata and this blog is on an issue that bothered one of my customer a lot.

    The background of the issue is : A Customer supporting a large application invested in an Exadata Machine, with a business justification of providing an SLA for critical business processes. Some of these processes have been violating the SLA’s and therefore, with all the options of Optimization faded, the change was important. During the POC phase, all but one process showed significant improvement. One process took longer than expected and was a threat to the business justification provided before investing.

    About the Process

    It’s a Data Loading process that creates a new table from another table based on some conditions. This newly created table is then used by other processes. On a Non-Exadata Machine, the table creation takes around 3 hours. Customer expected this to come down to less than 10 minutes and was shocked when it took more than 2 hours. The completion time on Exadata is not known, as the customer (due to sheer frustration) killed the process. Fortunately (atleast for me), when the process did not complete, the administrator enabled 10046 trace for the process and, in this blog, I will share a critical information from the trace that helped optimizing the process.

    Before I dive into the issue and the resolution, let us understand a critical aspect of Exadata. A Query (FTS) executed against Exadata Machine, the required blocks can be read from the storage in two different approaches. These are : (1) either read the required columns (column projection) or the required rows (predicate filtering) from the Storage using iDB and/or (2) read the entire block (and finally all the blocks) into the SGA. The former approach is called as Smart Scan (or Query Offloading), whereas, the later is similar to the strategy used by Non-Exadata Oracle Implementation.

    Back to the issue : Involvement, post-performance issues, requires analyzing traces and in our case, it was 10046 trace file. From the trace file, I could see that the Query was doing a Full Table Scan for loading the data into another table. Further, the prominent waits associated with this Query were “cell multiblock physical read” and “cell single block physical read”. From these waits, it was evident that the Query was not offloaded to the Exadata Storage and therefore, the performance of this query would not benefit from Exadata Storage. As mentioned in my previous paragraph, if the query is not offloaded to the Exadata Storage, the behavior and performance would be nearly similar to that of non-exadata storage. Please note, this is a Full Table Scan and therefore, flash cache will not be of any use here, unless, the table in the query is specifically cached using KEEP clause.

    The Query that creates a table is mentioned below and you need to believe me that it does a Full Table Scan on ST Table.

    create table D_ME1 as  
    select 	ST.SRC_SYS_REF_ID SRC_SYS_REF_ID, ST.XCD_SRC_ENTY_REF_ID SRC_ENTY_REF_ID,  
    	ST.XCD_SRC_ID SRC_SYS_ID,  
    	(select REF.REF_ID from test.reference_v REF where REF.REF_TYP_NAME='OGRDS ENTITY TYPE' and REF.REF_VAL = 'CROSSCODE') OGRDS_ENTY_REF_ID,
    	'XCD' OGRDS_ENTY_NAME,  
    	XCD_ID_SEQ.NEXTVAL OGRDS_ID,  
    	'PNB_MDL_LOAD' CRTD_BY,  
    	sys_extract_utc(systimestamp) CRTD_DTTM,  
    	'PNB_MDL_LOAD' UPD_BY,  
    	sys_extract_utc(systimestamp) UPD_DTTM  
    from 	P2_MDL_E2E.ST ST  
    where  	(1=1)    
    And 	(ST.SRC_SYS_REF_ID=1)    
    AND 	('Y')=(SELECT INTF_A.RUN_IND FROM p2_mdl_ctrl_e2e.CT_LST INTF_A
    		WHERE 	INTF_A.INTF_NAME='PNB_MDL_INT_XREF_XCD_B' 
    		AND 	INTF_A.SRC_SYS_REF_ID=1
                  	AND 	INTF_A.SCEN_NAME='PNB_MDL_PKG_XREF_XCD_B' 
                  	AND	INTF_A.FLOW_NAME='EIMDB_INITIAL_LOAD');              
    

    A simple change in the query improved the performance and the execution time was 3-4 minutes. This was primarily due to Query Offloading, which forces the processing of the Query at the Storage level and sent only the relevant data to the Database. This is the first approach mentioned above. What was the change that triggered the important Optimization of Offloading ? In other words, What prevented the Original Query to be Offloaded to the Exadata Storage ? The answer is the XCD_ID_SEQ.NEXTVAL. With Oracle Sequences, the database engine has to generate and maintain the values in the SGA. Oracle Sequences cannot be offloaded and therefore, if these are specified in the query, the optimization fallsback to the traditional method of sending each block to the SGA. For testing purpose, when the Sequence was replaced by rownum (and as a final workaround to to_char(sysdate,’DDMMYYYYHH24MISS’)||rownum) offloading came into effect and the execution time dropped to 3-4 minutes. The fresh 10046 traces also revealed that the wait events for the modified query changed to “cell smart table scan” which means a Smart Scan. The final modified query is as under :

    create table D_ME1 as  
    select 	ST.SRC_SYS_REF_ID SRC_SYS_REF_ID, ST.XCD_SRC_ENTY_REF_ID SRC_ENTY_REF_ID,  
    	ST.XCD_SRC_ID SRC_SYS_ID,  
    	(select REF.REF_ID from test.reference_v REF where REF.REF_TYP_NAME='OGRDS ENTITY TYPE' and REF.REF_VAL = 'CROSSCODE') OGRDS_ENTY_REF_ID,
    	'XCD' OGRDS_ENTY_NAME,  
    	to_char(sysdate,'DDMMYYYYHH24MISS')||rownum OGRDS_ID,  /* this creates a Unique value forcing Offloading */
    	'PNB_MDL_LOAD' CRTD_BY,  
    	sys_extract_utc(systimestamp) CRTD_DTTM,  
    	'PNB_MDL_LOAD' UPD_BY,  
    	sys_extract_utc(systimestamp) UPD_DTTM  
    from 	P2_MDL_E2E.ST ST  
    where  	(1=1)    
    And 	(ST.SRC_SYS_REF_ID=1)    
    AND 	('Y')=(SELECT INTF_A.RUN_IND FROM p2_mdl_ctrl_e2e.CT_LST INTF_A
    		WHERE 	INTF_A.INTF_NAME='PNB_MDL_INT_XREF_XCD_B' 
    		AND 	INTF_A.SRC_SYS_REF_ID=1
                  	AND 	INTF_A.SCEN_NAME='PNB_MDL_PKG_XREF_XCD_B' 
                  	AND	INTF_A.FLOW_NAME='EIMDB_INITIAL_LOAD');              
    
    

    In a nutshell, while Exadata will give you tremendous performance, especially in case of a Full Table Scan, there are cases that require human intelligence :). In this case, while both the queries perform a Full Table Scan, the Original prevented Optimization by way of Offloading (Smart Scan), while the modified code smartly offloaded the query to the Storage bringing down the execution time drastically.

    Exadata v/s Solid State Disk ! Performance & Scalability Comparison.

    During one of my Exadata Presentation, one of the attendees asked about the Performance & Scalability Difference between EXADATA and SSD. He mentioned that the Performance POC for one of their critical application on SSD and Exadata gave similar results. Before replying to his query, I asked him “What was the Average CPU Utilization during both the POC’s ?”. While the answer to my question was interesting, thought of writing a short blog on this debate and at the end will be the response by the attendee to my question.

    Lot has been written on the comparison between the two. For example, visit asktom.oracle.com and Kevin Closson’s blog. I will just write on my reply with a brief explanation.

    EXADATA is one of the biggest breakthrough in achieving fast computing and therefore, clearly overpowers other technologies around it. While other technologies might give you a similar performance, but investing only for performance is not the only Organization Goal. The primary reason, which I feel, Exadata scores over others is the Scalability Factor. With that said, while Exadata gives Extreme Performance, it improves the Scalabiity as well. SSD and any other technologies are Hardware components, whereas, Exadata is a combination of Hardware and Intelligent Software and this combination makes it Faster and Scalable.

    In an Oracle Database or an Application deployed on top of Oracle, Latches are considered to be a Performance and Scalability Inhibitor. Every Block access (into the cache or from the cache), requires Latch and if these latch gets are in access, these excessive visits to the cache will burn the CPU power thus impacting the performance and scalability. Exadata has changed this approach, by introducing a faster hardware that scans the required blocks and an Intelligent Software on top of it, to filter only the required column data directly to the Private Memory Area (PGA) of the User Process. This Intelligence bypasses the buffer cache thus avoiding a costlier step of Latching. This has a huge impact on the CPU Cycles and therefore, if the Database Server has enough CPU Power available, this additional CPU Power can be translated to Scalability.

    Usually, in an Oracle Database on a traditional (non-exadata) storage, the blocks are read from the Disk to the Buffer Cache and then the required column data is filtered and fetched to the Private Memory of the User Process. Reading a Block from Disk to Cache, filtering the required data from Cache involves CPU. All of these consume CPU Cycles on the Database Server. As mentioned, in an Exadata Storage, the combination of Hardware and Software works together to eliminate the costlier from the Database Server. From this explanation, it is quite evident that Exadata is not only improves performance, but also provides scalability. All of this is achieved without making any changes in the application. However, further optimization in an Exadata is also possible.

    Now to the reply on my Question “What was the Average CPU Utilization during both the POC’s ?”. The reply was that through out the POC, the CPU Utilization on an Exadata Machine was always far below than the POC on SSD. Solid State Disk’s would provided faster access to the blocks that are read from the disk, but will not eliminate the costlier steps of latching the blocks from the Database Server. Therefore, savings in CPU Cycles is not expected. Infact, it may worsen the performance.

    In one of the paragraph above, I mentioned about the steps as a part of row fetching that an Oracle Database, on an non-exadata storage, performs on behalf of a User. This is, a block is read from Disk to the Cache and then the required columns and row data is fetched to the Private Memory area, which is to the PGA, of the User Process. This contradicts to the steps that I mentioned in one of my previous blog on Consistent Gets. This blog rectifies the error in my assumption on the steps performed and mentioned in the said previous blog.

    %d bloggers like this: