CPU Cycles for Column Skipping…

I wanted to write this long back, however, got busy with some critical Performance POC on an Oracle Engineered System and this took much of my time.

Database Design requires careful planning. This blog of mine is on Table structure and the strategy for placement of columns. Whether to place NULL columns at the end to save on Storage space or to place frequently accessed columns at the top, are the two dominant discussions. This blog will focus on the motivational reasons for placing frequently queried columns at the top of the table design.

For the demonstration purpose, I will create two tables T1 & T2. Both tables are exactly the same except for one column N2, which is a 3rd column of T1 and 13th Column of table T2. We will run a query against both the tables and the resulting output will be exactly same as well. However, the amount of work required for a query on T2 will more than T1.

create table T1 as
select 	level ID, mod(level,2) N1, mod(level,10) N2,  mod(level,100) N3, mod(level,1000) N4, 
mod(level,1000) N5, mod(level,10000) N6, mod(level,5) N7, mod(level,50) N8, mod(level,500) N9, 
case when mod(level,10000)=0 then 'AIOUG' else dbms_random.string('A',10) end V6,
mod(level,5000) N10, mod(level,50000) N11
from dual connect by level <= 100000;

create table T2 as
select 	level ID, mod(level,2) N1, mod(level,100) N3, mod(level,1000) N4, mod(level,1000) N5,
mod(level,10000) N6, mod(level,5) N7, mod(level,50) N8, mod(level,500) N9, 
case when mod(level,10000)=0 then 'AIOUG' else dbms_random.string('A',10) end V6,
mod(level,5000) N10, mod(level,50000) N11,
mod(level,10) N2	
from dual connect by level <= 100000;

exec dbms_stats.gather_table_stats(user,'T1');
exec dbms_stats.gather_table_stats(user,'T2');

select owner, num_rows, blocks, last_analyzed from dba_tables where table_name ='T1';

OWNER		       NUM_ROWS     BLOCKS DEGREE     LAST_ANALYZED
-------------------- ---------- ---------- ---------- --------------------
SCOTT			 100000        846	    1 30-APR-2015 13:42:24

select owner, num_rows, blocks, last_analyzed from dba_tables where table_name ='T2';

OWNER		       NUM_ROWS     BLOCKS DEGREE     LAST_ANALYZED
-------------------- ---------- ---------- ---------- --------------------
SCOTT			 100000        846	    1 30-APR-2015 13:42:25

The two tables are identical in terms of number of rows and blocks. The only difference is the column ordering.

select column_id, column_name, num_distinct, num_nulls from user_tab_columns where table_name='T1' order by 1;

 COLUMN_ID COLUMN_NAME			  NUM_DISTINCT	NUM_NULLS
---------- ------------------------------ ------------ ----------
	 1 ID					100000		0
	 2 N1					     2		0
	 3 N2					    10		0 <-- Column # 3
	 4 N3					   100		0
	 5 N4					  1000		0
	 6 N5					  1000		0
	 7 N6					 10000		0
	 8 N7					     5		0
	 9 N8					    50		0
	10 N9					   500		0
	11 V6					100000		0
	12 N10					  5000		0
	13 N11					 50536		0

13 rows selected.

select column_id, column_name, num_distinct, num_nulls from user_tab_columns where table_name='T2' order by 1;

 COLUMN_ID COLUMN_NAME			  NUM_DISTINCT	NUM_NULLS
---------- ------------------------------ ------------ ----------
	 1 ID					100000		0
	 2 N1					     2		0
	 3 N3					   100		0
	 4 N4					  1000		0
	 5 N5					  1000		0
	 6 N6					 10000		0
	 7 N7					     5		0
	 8 N8					    50		0
	 9 N9					   500		0
	10 V6					 99864		0
	11 N10					  5000		0
	12 N11					 50536		0
	13 N2					    10		0 <-- Column # 13

13 rows selected.

To come out with the CPU Calculation for column skip, we will have to first get the CPU_Cost for a Full Table Scan of these tables.

delete from plan_table;
explain plan for
select id,n1 from t1;

/* Blocks = 846 as per dba_tables
select round(7121.44*&blocks+(150*100000)+(20*100000*(2-1))) from dual; 

ROUND(7121.44*846+(150*100000)+(20*100000*1))
---------------------------------------------
				     23024738


SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;

OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
------------------------------ -------------------- ---------- ---------- ---------- ----------
SELECT STATEMENT					   434	 23024738	 425	      1
TABLE ACCESS		       FULL			   434	 23024738	 425	      1

The CPU_Cost for a Full Table Scan of T1 is 23024738. I assume, the other numbers are Oracle calculations and are the defaults. 7121.44 Cpu Cycles per block multiplied by number of blocks. 150 CPU Cycles for each Row and 20 CPU Cycles for Column Skip. So, 20*100000*(2-1) is for 20 CPU Cycles multiplied by Highest ID – Lowest ID (2-1) of the columns referred in the query. Since the two tables are same in terms of number of rows and blocks, the calculation for a FTS should remain same. Lets see..

delete from plan_table;

explain plan for
select id,n1 from t2;

/* Blocks = 846 for this table as well */
select round(7121.44*&blocks+(150*100000)+(20*100000*(2-1))) from dual;

ROUND(7121.44*846+(150*100000)+(20*100000*1))
---------------------------------------------
				     23024738

SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;

OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
------------------------------ -------------------- ---------- ---------- ---------- ----------
SELECT STATEMENT					   434	 23024738	 425	      1
TABLE ACCESS		       FULL			   434	 23024738	 425	      1

CPU_Cost is same for both as the two have same number of blocks, rows and the table is similar in structure until column N1. The CPU Cost of 23024738 will be used in our calculations further and therefore you may want to note down this figure.

Next, we shall run the same query for the two tables, but with a minor change. We will add a predicate WHERE N2=:b2. Remember, this column differentiate the two tables else everything remains identical.


delete from plan_table;

explain plan for
select id, n1 from t1 where n2=:b1;

/* 7121.44 is now replaced with the CPU_Cost Value */
select 23024738+150*100000+(20*100000*(3-2)) from dual;

/* The calculation says */
23024738+150*100000+(20*100000*(3-2))
-------------------------------------
			     40024738

SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;

OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
------------------------------ -------------------- ---------- ---------- ---------- ----------
SELECT STATEMENT					   441	 40024738	 425	      1
TABLE ACCESS		       FULL			   441	 40024738	 425	      1

The CPU Cost has gone up to 40024738 and this is evident as predicate filtering requires CPU Cycles as well. Let us now check for T2.

delete from plan_table;

explain plan for
select id, n1 from t2 where n2=:b1;

/* in this case 13 is the ID for n2 */
select 23024738+150*100000+(20*100000*(13-2)) from dual;

23024738+150*100000+(20*100000*(13-2))
--------------------------------------
			      60024738

SQL> select OPERATION, OPTIONS, COST, CPU_COST, IO_COST, TIME from plan_table;

OPERATION		       OPTIONS			  COST	 CPU_COST    IO_COST	   TIME
------------------------------ -------------------- ---------- ---------- ---------- ----------
SELECT STATEMENT					   449	 60024738	 425	      1
TABLE ACCESS		       FULL			   449	 60024738	 425	      1

This is interesting. The two tables are exactly same with only a difference of column ordering. This difference is good enough for the increased CPU_Cost (60024738 against 40024738) and the Cost of the Query has also gone up from 441 to 449. Don’t get surprised to see the increased cost (and the change in the plan) post adding a new column to a table and including the newly added column in your frequently executed queries.

This holds true for queries with Index Scan as well. I have a demonstration with Index Scans as well and can be shared on request.

For my readers – What about a column, which is created as INVISIBLE and later made VISIBLE (post adding many more columns to the table) ? What will be the impact of querying this now VISIBLE column ? Interesting to test this out…

Advertisements

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

2 Responses to CPU Cycles for Column Skipping…

  1. tiwariyogesh84 says:

    How do we know exact cpu cycles for each operation?

    -Yogi

    Like

    • Vivek Sharma says:

      For this, you will have to do a little experiment. This will include a Blank Table and then with 1 row, 10 rows and so on…I did the same to arrive at a conclusion. I demonstrated a similar calculation in one of my AIOUG session.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s