OTNYathra 2016

Presented almost after a gap of 5 years at Chennai. It was a good crowd. Now turn for the Mumbai event on 30th April 2016. I plan to blog a series of fundamentals on Optimizer with basics to Histograms for my readers. I will start this post my Mumbai Event. This will help the attendees to understand the concept well.

In-Memory Store – Push Down Optimization

Last week, I concluded my first ever Event in Gurgaon. This was a full day event on Performance Tuning. During the session, on Query Transformation, I mentioned about “Testing rather than Believing”. The rationale behind this is : It is not always guaranteed that the test case and the transformation discussed will be reproduced in your environment. It is not that the transformation and the test cases are wrong. It is just that there are environmental differences that can cause this. In that case, you would be able to work on the solution to make the transformation happen and therefore, testing will give you good enough knowledge of various transformations or Database Features / Options. My current blog in on one such Database feature : In-Memory Store – Push Down Predicate.”

In-Memory Store is introduced in 12.1.0.2 and is intended to read the required data from a new memory area, which is also a part of the SGA. This memory is sized by way of INMEMORY_SIZE database parameter. In-Memory Store caches the data in a Columnar format as against Row format used by our traditional Buffer Cache. There are many benefits to it. Buffer Cache is still used for our OLTP applications, whereas, queries processing large data and few columns can benefit from In-Memory Store. Biggest advantage being, both these can be implemented on a Single Database thus requiring no complex Data Transformation.

While working on some of the features of In-Memory, I came across Push-Down Optimization. This optimization pushes the predicates, aggregations and group-bys to the access layer i.e.at the time of scanning the column or group of columns, returning just the small subset of data to the query layer. The number of rows returned to the Query layer depends on the number of In-Memory Compression Unit. Thus, the amount of data to be process by the Query layer is reduced drastically making the queries much more efficient and faster. At this point, I will direct you to In-Memory Blogs which is maintained by Oracle Development Folks. The explanation in this blog is self explanatory, therefore, I would not publish this in my blog. I don’t want to make a copy of well-written blog:). The idea behind this blog post is to let the readers know of the issue that I faced while testing this.

As mentioned earlier, I test whatever is published and this gives me a better understanding of the feature. While testing this, I could reproduce the results mentioned in this Blog Post.

For this, I created my own LINEORDERS table and executed the queries mentioned in the blog. The results are published as under :

## STATS table to hold the Statistics from V$MYSTAT

create global temporary table stats on commit delete rows as
select s.sid, n.name, s.value
from   v$mystat s, v$statname n
where  s.statistic#=n.statistic#
and    1=2;

  CREATE TABLE LINEORDER
   (    LO_ORDERID NUMBER,
        LO_PRODUCTNAME VARCHAR2(128) NOT NULL ENABLE,
        LO_SHIPQTY NUMBER,
        LO_ORDERVALUE NUMBER,
        LO_SHIPMODE VARCHAR2(4)
   ) 
  TABLESPACE USERS
  INMEMORY PRIORITY CRITICAL MEMCOMPRESS FOR QUERY LOW
  DISTRIBUTE AUTO NO DUPLICATE
  NO INMEMORY (LO_ORDERID)
  NO INMEMORY (LO_PRODUCTNAME);

insert into lineorder
select rownum lo_orderid, a.object_name lo_productname,
       round(dbms_random.value(1,100),0) lo_shipqty,
       round(dbms_random.value(1000,10000),2) lo_ordervalue,
        case when mod(rownum,3)=0 then 'AIR'
            when mod(rownum,7)=0 then 'SHIP'
            when mod(rownum,10)=0 then 'RAIL'
        else 'ROAD' end lo_shipmode
from    all_objects a, all_objects b
where rownum<=1e+7;

commit;
exec dbms_stats.gather_table_stats(user,'LINEORDER');

SQL> select num_rows, blocks, inmemory from dba_tables where table_name='LINEORDER';

  NUM_ROWS     BLOCKS INMEMORY
---------- ---------- --------
  10000000      41717 ENABLED

SQL> select segment_name, bytes, inmemory_size, bytes_not_populated, populate_status
from v$im_segments;
  2  
SEGMENT_NAME                        BYTES INMEMORY_SIZE BYTES_NOT_POPULATED POPULATE_
------------------------------ ---------- ------------- ------------------- ---------
LINEORDER                       343932928     133562368                   0 COMPLETED

On my laptop, I have 4 GB memory and therefore, had to restrict my SGA Size. The tablespace size is around 343MB. I have sized my In-Memory Store to 200MB and therefore, the table was created with PCFTREE as 1, so as to, have this table as small as possible. With default PCTFREE, the entire table is not populated into the Store.

Now, lets start our test, execute the queries as per the blog and validate the results.

SQL> insert into stats
select s.sid, n.name, s.value
from   v$mystat s, v$statname n
where  s.statistic#=n.statistic#
and    n.name like 'IM%';
211 rows created.

SQL> select /*+ VIVEK_IMCU */ lo_shipmode, count(*) from lineorder
group by lo_shipmode;

LO_S   COUNT(*)
---- ----------
RAIL     571429
SHIP     952381
ROAD    5142857
AIR     3333333

SQL>select a.name, m.value - a.value value
from    v$mystat m, stats a, v$statname b
where a.name = b.name
and   m.statistic#=b.statistic#
and   m.value - a.value >0
order by 1;
  
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
IM scan CUs columns accessed                                             21
IM scan CUs columns theoretical max                                     105
IM scan CUs memcompress for query low                                    21
IM scan CUs no cleanout                                                  21
IM scan CUs split pieces                                                 23
IM scan bytes in-memory                                           116692649
IM scan bytes uncompressed                                        126565381
IM scan rows                                                       10000000
IM scan rows projected                                             10000000
IM scan rows valid                                                 10000000

As per the Blog, the value for “IM scan rows projected” statistics should have been 84 (in my case). The statistics show that the query accessed 21 Compression Units (IM scan CUs memcompress for query low). I have 4 values for lo_shipmode i.e AIR, SHIP, ROAD & RAIL. So, this make 21 x 4 = 84. However, the statistics “IM scan rows projected”, from my testing, is the number of rows in the table (10 Million). Why is this discrepancy ? Is my test case wrong ?

I executed other queries as well and the results were the same i.e.I could not reproduce the statistics mentioned in the blog. I was sure that there is some mismatch in the configuration. Usually, for any such issues, customers are advised to be on latest Bundle Patch. Therefore, I applied Bundle Patch 10 as well, but the results were same. It took a while to investigate the cause of this.

I reported this to the author of the blog and they immediately started investigation on this. For analysis, I had sent the table creation script, along with the run time execution plan. Interestingly, the issue was drilled down to database parameter setting STATISTICS_LEVEL. This setting on my database was ALL. With default or TYPICAL, I was able to reproduce the results as per the blog.

SQL> show parameter statistics_level

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
statistics_level                     string      ALL

SQL> alter session set statistics_level=typical;

Session altered.

SQL> insert into stats
select s.sid, n.name, s.value
from   v$mystat s, v$statname n
where  s.statistic#=n.statistic#
and    n.name like 'IM%';

211 rows created.

SQL> select /*+ VIVEK_IMCU */ lo_shipmode, count(*) from lineorder
group by lo_shipmode;

LO_S   COUNT(*)
---- ----------
RAIL     571429
SHIP     952381
ROAD    5142857
AIR     3333333

SQL> select a.name, m.value - a.value value
from    v$mystat m, stats a, v$statname b
where a.name = b.name
and   m.statistic#=b.statistic#
and   m.value - a.value >0
order by 1;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
IM scan CUs columns accessed                                             21
IM scan CUs columns theoretical max                                     105
IM scan CUs memcompress for query low                                    21
IM scan CUs no cleanout                                                  21
IM scan CUs split pieces                                                 23
IM scan bytes in-memory                                           116692649
IM scan bytes uncompressed                                        126565381
IM scan rows                                                       10000000
IM scan rows projected                                                   84
IM scan rows valid                                                 10000000

10 rows selected.

To summarize, STATISTICS_LEVEL=ALL causes a problem for Push Down Optimization. BUG has been filed for this issue and the resolution/fix is expected soon. Will keep you posted.

The idea behind this blog was just to let the readers know the importance of Testing before concluding. Each Database setup is different. A minor change can cause a change in the behaviour and it will be in the interest of the readers to investigate this change.

Performance Tuning Day – Gurgaon ! My First ever event in Gurgaon

I had presented 2 full day events in Pune. These were on Performance Optimization. AIOUG is now replicating this same event in Gurgaon. I will be presenting a full day “Performance Tuning Day” on 17th October 2015. I had been speaking for the User Group for almost 8 years now, however, in Delhi/Gurgaon region, this will be my first ever presentation. Looking forward for a great crowd.

Registration Link

Performance Tuning Day Part II – Pune Chapter of All India Oracle User Group

Mark your calendar for the Part II of my Performance Tuning Day Event at Pune. This is scheduled for 12th September 2015. Registration link is open. No worries for those who missed my previous session (Part I) as I have a re-cap of my the previous session. See you all on 12th September.

Click for Registration

Performance Tuning Day…All India Oracle User Group

Concluded a 5 hour session on Performance Optimization for Pune Chapter of All India Oracle User Group. Due to unexpected massive traffic jam, reached the venue late and therefore, the sessions were behind the actual schedule. Had to cancel a session on Query Optimizer. A Big Disappointment for this cancellation. As a speaker, it really disappoints when you have to run few slides and/or cancel a session. Sorry folks. Will check with the Organizer for the Part II of this event, where we can cover this interesting topic.

The crowd, as expected, was interesting. They made the whole event Interactive with interesting questions. As a Speaker, you enjoy if your participants are deeply involved and raise questions to clarify their doubts, which gives you a sense that the crowd is listening to what you are saying. Thanks to all for lighting up the event.

Last but not the least, Hats Off to the Organizers. They worked hard to make this event a grand success. Their meticulous planning is appreciated.

AIOUG Performance Tuning Day – Pune

I am presenting a full day event on 8th August 2015 at Pune. This is a Performance tuning day that will cover some of the interesting performance issues and solutions. Looking forward to see you all at Pune. Click on the following link to view the schedule and to register.

Performance Tuning Day Link

AIOUG Performance Tuning Day ! Hyderabad…

AIOUG is hosting a full day Performance Tuning day on 30th May 2015 in Hyderabad. I will be presenting on Oracle Query Optimizer and Performance. These sessions will be a mix of technical demos and real life examples. Hope to see a large gathering. Registrations are open at Performance Tuning Day.

Auto DOP ! 11g Feature…

Recently, during one of the Performance POC on an Oracle Engineered System, I got to implement Auto DOP feature for one of their critical Batch Job. This Batch job is a a set of multiple processes. Some of the processes run as a Serial Process, some these are just one process, but the query is hinted to use Oracle Parallelism and some of these are parallel application threads.

For AUTO DOP to work, there are two mandatory settings. These are IO Calibration and setting of parallel_degree_policy (which defaults to MANUAL, means no AUTO DOP). I started with AUTO settings as LIMITED requires changing DEGREE for all (or critical huge) tables. AUTO setting means, Optimizer will calculate and come out with the Degree of Parallelism. Further, this setting also enables two features: Parallel_Statement_Queuing and In-Memory Parallel Execution. In-Memory Parallel execution might disable Exadata Smart Scan, therefore, I modified _parallel_cluster_cache_policy to ADAPTIVE i.e no In-Memory Parallel Execution. IO Calibration was done, so that, optimizer can take the advantage of various values (as can be seen from DBA_RSRC_IO_CALIBRATION) to come out with Degree of Parallelism.

My initial test run was not that successful. While investigating this, I came across a very informative article from Gwen Shapira. As per this article, for AUTO DOP considers MAX_PMBPS value only and this can be set manually to 200. In my case, since the calibration process was executed, the value for MAX_PMBPS was around 3000+. As I understand, this value depicts the throughput and therefore, larger the value, lower the DOP will be. I followed the advice in the article and deleted the rows from resource_io_calibrate$.

delete from resource_io_calibrate$;
insert into resource_io_calibrate$ values(current_timestamp, current_timestamp, 0, 0, 200, 0, 0);
commit;

Post setting this value manually, the database needs a restart. This change did the trick. The performance of many of the processes improved drastically. For some of the queries, we could also see Statement Queuing working effectively i.e. waiting for the resources to be available and taking up the required DOP, rather than downgrading the parallel degree. However, the customer was bit apprehensive about AUTO word and therefore, decided to Implement the optimizer calculations of DOP manually into each of these queries and disabling this feature. This change worked as well.

My observations from this nice feature (and I may be wrong as I did not get much time to investigate or work on this), to name a few are :

  • Any Query that takes more than 10 Seconds (Optimizer Estimation) will be subject to Auto DOP
  • A Query with parallel hint will also go through serial execution run time evaluation. This means, if optimizer estimates the runtime of serial execution to be less than 10 seconds, parallel hint will be ignored.
  • In my opinion, AUTO DOP is one of the best innovation.

    CPU Cycles for Column Skipping…

    I wanted to write this long back, however, got busy with some critical Performance POC on an Oracle Engineered System and this took much of my time.

    Database Design requires careful planning. This blog of mine is on Table structure and the strategy for placement of columns. Whether to place NULL columns at the end to save on Storage space or to place frequently accessed columns at the top, are the two dominant discussions. This blog will focus on the motivational reasons for placing frequently queried columns at the top of the table design.

    For the demonstration purpose, I will create two tables T1 & T2. Both tables are exactly the same except for one column N2, which is a 3rd column of T1 and 13th Column of table T2. We will run a query against both the tables and the resulting output will be exactly same as well. However, the amount of work required for a query on T2 will more than T1.

    create table T1 as
    select 	level ID, mod(level,2) N1, mod(level,10) N2,  mod(level,100) N3, mod(level,1000) N4, 
    mod(level,1000) N5, mod(level,10000) N6, mod(level,5) N7, mod(level,50) N8, mod(level,500) N9, 
    case when mod(level,10000)=0 then 'AIOUG' else dbms_random.string('A',10) end V6,
    mod(level,5000) N10, mod(level,50000) N11
    from dual connect by level <= 100000;
    
    create table T2 as
    select 	level ID, mod(level,2) N1, mod(level,100) N3, mod(level,1000) N4, mod(level,1000) N5,
    mod(level,10000) N6, mod(level,5) N7, mod(level,50) N8, mod(level,500) N9, 
    case when mod(level,10000)=0 then 'AIOUG' else dbms_random.string('A',10) end V6,
    mod(level,5000) N10, mod(level,50000) N11,
    mod(level,10) N2	
    from dual connect by level <= 100000;
    
    exec dbms_stats.gather_table_stats(user,'T1');
    exec dbms_stats.gather_table_stats(user,'T2');
    
    select owner, num_rows, blocks, last_analyzed from dba_tables where table_name ='T1';
    
    OWNER		       NUM_ROWS     BLOCKS DEGREE     LAST_ANALYZED
    -------------------- ---------- ---------- ---------- --------------------
    SCOTT			 100000        846	    1 30-APR-2015 13:42:24
    
    select owner, num_rows, blocks, last_analyzed from dba_tables where table_name ='T2';
    
    OWNER		       NUM_ROWS     BLOCKS DEGREE     LAST_ANALYZED
    -------------------- ---------- ---------- ---------- --------------------
    SCOTT			 100000        846	    1 30-APR-2015 13:42:25
    

    The two tables are identical in terms of number of rows and blocks. The only difference is the column ordering.

    select column_id, column_name, num_distinct, num_nulls from user_tab_columns where table_name='T1' order by 1;
    
     COLUMN_ID COLUMN_NAME			  NUM_DISTINCT	NUM_NULLS
    ---------- ------------------------------ ------------ ----------
    	 1 ID					100000		0
    	 2 N1					     2		0
    	 3 N2					    10		0 <-- Column # 3
    	 4 N3					   100		0
    	 5 N4					  1000		0
    	 6 N5					  1000		0
    	 7 N6					 10000		0
    	 8 N7					     5		0
    	 9 N8					    50		0
    	10 N9					   500		0
    	11 V6					100000		0
    	12 N10					  5000		0
    	13 N11					 50536		0
    
    13 rows selected.
    
    select column_id, column_name, num_distinct, num_nulls from user_tab_columns where table_name='T2' order by 1;
    
     COLUMN_ID COLUMN_NAME			  NUM_DISTINCT	NUM_NULLS
    ---------- ------------------------------ ------------ ----------
    	 1 ID					100000		0
    	 2 N1					     2		0
    	 3 N3					   100		0
    	 4 N4					  1000		0
    	 5 N5					  1000		0
    	 6 N6					 10000		0
    	 7 N7					     5		0
    	 8 N8					    50		0
    	 9 N9					   500		0
    	10 V6					 99864		0
    	11 N10					  5000		0
    	12 N11					 50536		0
    	13 N2					    10		0 <-- Column # 13
    
    13 rows selected.
    
    

    To come out with the CPU Calculation for column skip, we will have to first get the CPU_Cost for a Full Table Scan of these tables.

    delete from plan_table;
    explain plan for
    select id,n1 from t1;
    
    /* Blocks = 846 as per dba_tables
    select round(7121.44*&blocks+(150*100000)+(20*100000*(2-1))) from dual; 
    
    ROUND(7121.44*846+(150*100000)+(20*100000*1))
    ---------------------------------------------
    				     23024738
    
    
    SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;
    
    OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
    ------------------------------ -------------------- ---------- ---------- ---------- ----------
    SELECT STATEMENT					   434	 23024738	 425	      1
    TABLE ACCESS		       FULL			   434	 23024738	 425	      1
    
    

    The CPU_Cost for a Full Table Scan of T1 is 23024738. I assume, the other numbers are Oracle calculations and are the defaults. 7121.44 Cpu Cycles per block multiplied by number of blocks. 150 CPU Cycles for each Row and 20 CPU Cycles for Column Skip. So, 20*100000*(2-1) is for 20 CPU Cycles multiplied by Highest ID – Lowest ID (2-1) of the columns referred in the query. Since the two tables are same in terms of number of rows and blocks, the calculation for a FTS should remain same. Lets see..

    delete from plan_table;
    
    explain plan for
    select id,n1 from t2;
    
    /* Blocks = 846 for this table as well */
    select round(7121.44*&blocks+(150*100000)+(20*100000*(2-1))) from dual;
    
    ROUND(7121.44*846+(150*100000)+(20*100000*1))
    ---------------------------------------------
    				     23024738
    
    SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;
    
    OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
    ------------------------------ -------------------- ---------- ---------- ---------- ----------
    SELECT STATEMENT					   434	 23024738	 425	      1
    TABLE ACCESS		       FULL			   434	 23024738	 425	      1
    
    

    CPU_Cost is same for both as the two have same number of blocks, rows and the table is similar in structure until column N1. The CPU Cost of 23024738 will be used in our calculations further and therefore you may want to note down this figure.

    Next, we shall run the same query for the two tables, but with a minor change. We will add a predicate WHERE N2=:b2. Remember, this column differentiate the two tables else everything remains identical.

    
    delete from plan_table;
    
    explain plan for
    select id, n1 from t1 where n2=:b1;
    
    /* 7121.44 is now replaced with the CPU_Cost Value */
    select 23024738+150*100000+(20*100000*(3-2)) from dual;
    
    /* The calculation says */
    23024738+150*100000+(20*100000*(3-2))
    -------------------------------------
    			     40024738
    
    SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;
    
    OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
    ------------------------------ -------------------- ---------- ---------- ---------- ----------
    SELECT STATEMENT					   441	 40024738	 425	      1
    TABLE ACCESS		       FULL			   441	 40024738	 425	      1
    
    

    The CPU Cost has gone up to 40024738 and this is evident as predicate filtering requires CPU Cycles as well. Let us now check for T2.

    delete from plan_table;
    
    explain plan for
    select id, n1 from t2 where n2=:b1;
    
    /* in this case 13 is the ID for n2 */
    select 23024738+150*100000+(20*100000*(13-2)) from dual;
    
    23024738+150*100000+(20*100000*(13-2))
    --------------------------------------
    			      60024738
    
    SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;
    
    OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
    ------------------------------ -------------------- ---------- ---------- ---------- ----------
    SELECT STATEMENT					   449	 60024738	 425	      1
    TABLE ACCESS		       FULL			   449	 60024738	 425	      1
    
    

    This is interesting. The two tables are exactly same with only a difference of column ordering. This difference is good enough for the increased CPU_Cost (60024738 against 40024738) and the Cost of the Query has also gone up from 441 to 449. Don’t get surprised to see the increased cost (and the change in the plan) post adding a new column to a table and including the newly added column in your frequently executed queries.

    This holds true for queries with Index Scan as well. I have a demonstration with Index Scans as well and can be shared on request.

    For my readers – What about a column, which is created as INVISIBLE and later made VISIBLE (post adding many more columns to the table) ? What will be the impact of querying this now VISIBLE column ? Interesting to test this out…

    Table with 255 columns or more…

    I wrote a BLOG on the Intra Row Chaining and the issues around this for a table that has more than 255 columns. Jonathan Lewis has written a very fantastic blog on this subject with some more explanation and a real life example. Worth Reading.

    Jonathan’s Link

    During my recent AIOUG OTN Yathra Tech Day in Chandigarh, I also demonstrated the CPU Cycles required for a Column Skip. This means, apart from the extra I/O required to read a column of a larger table, the CPU consumption will also be higher. Since this topic has come up and Jonathan nicely touching this subject, I would post the CPU calculation Blog in a day or so.

    While the issue due to Intra Block Chaining is already published, the next blog will focus more on the CPU Cycles for column skipping for a small table (say with 14 columns) and then, this can be further co-related for a table with more columns. This will help DBA’s predict or evaluate the impact of the change (adding a new column) before being implemented on production. Also, a reason to have frequently queried on the top (Storage Space v/s CPU Consumption).

    Follow

    Get every new post delivered to your Inbox.

    Join 213 other followers

    %d bloggers like this: