Database Vault ! Autonomous Database

Availability of Database Vault on Autonomous Database (Shared Infrastructure) was announced in February 2020. Database Vault does not require any introduction as this feature has been available since Oracle 10g. As a brief, it provides powerful security controls to help protect sensitive application data from Unauthorised access and comply with privacy and regulatory requirements. This means, a superuser (or a user with DBA privilege) can access the sensitive application data. Oracle Database Vault implements this security at the database layer.

Database Vault provides additional security to the Autonomous Database Family and is just few steps. This blog demonstrates the steps needed to implement Database Vault on Autonomous Data Warehouse. Before I go through the steps, it is important to know a concept of REALMS. Realm is a grouping of database schemas, database objects and roles that must be secured for a given application. It is a kind of a Zone of Protection for your database objects. So, one of the steps required will be to create a Realm.

Every Autonomous Database has an ADMIN user, which is a super user. ADMIN user can query any table within the Database (by default). All the steps required for implementing Database Vault is to be executed using ADMIN user. Once done, ADMIN user will continue to be a super user, but will not be able to query the application data. In my case, assume VIVEK is an application schema containing all the tables accessed by my application and one such table is CREDIT_CARD.


- Connect to ADMIN User and Query the CREDIT_CARD Table. ADMIN is able to query the data.

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Check whether Data Vault is Configured and Enabled. It is not configured.

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS FALSE
DV_ENABLE_STATUS    FALSE

- Create Database Vault Owner and Database Vault Account Manager User.

CREATE USER ADB_DBV_OWNER IDENTIFIED BY "xxxxxxxxxxxxx";
CREATE USER ADB_DBV_ACCTMGR IDENTIFIED BY "xxxxxxxxxxxxx";
GRANT CREATE SESSION, SELECT ANY TABLE TO ADB_DBV_OWNER;

- Check whether we can select the data from CREDIT_CARD table, before implementing Database Vault
- Database Audit Vault User has SELECT ANY TABLE privilege.

CONNECT ADB_DBV_OWNER/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00


- Configure and Enable Database Vault

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low


EXEC DBMS_CLOUD_MACADM.CONFIGURE_DATABASE_VAULT('ADB_DBV_OWNER', 'ADB_DBV_ACCTMGR');

- Database Vault Configured, but not Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    FALSE

- Enable Database Vault. For this to come into effect, we need to restart the database

EXEC DBMS_CLOUD_MACADM.ENABLE_DATABASE_VAULT;

- Restart the Database. Check after Database Restart
- Database Vault Configured and Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    TRUE

- Admin can still Query the Data

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Create REALM, Add Object to the Realm 

BEGIN
 DBMS_MACADM.CREATE_REALM(
  realm_name    => 'Vivek Application Schema', 
  description   => 'Realm created for Vivek Schema', 
  enabled       => DBMS_MACUTL.G_YES, 
  audit_options => DBMS_MACUTL.G_REALM_AUDIT_FAIL + DBMS_MACUTL.G_REALM_AUDIT_SUCCESS,
  realm_type    => 1,
  realm_scope   => DBMS_MACUTL.G_SCOPE_LOCAL);
END; 
/

BEGIN
 DBMS_MACADM.ADD_OBJECT_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  object_owner => 'VIVEK', 
  object_name  => '%', 
  object_type  => '%'); 
END;
/

- Add Authorised User, which is the User that can Query the Application Table
- In this case, VIVEK is the application owner

BEGIN
 DBMS_MACADM.ADD_AUTH_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  grantee      => 'VIVEK', 
  auth_options => DBMS_MACUTL.G_REALM_AUTH_OWNER);
END;
/

- Check whether ADMIN can Query the sensitive Data from Vivek Schema

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;
                    *
ERROR at line 1:
ORA-01031: insufficient privileges

- Vivek being an Application Owner, can Query the data. No other Users allowed to Query

CONNECT VIVEK/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

As can be seen, the steps are pretty simple. This is a simplified version of Database Vault configuration. Refer to Oracle Documentation on the various configuration options available with Database Vault. Hope this helps.

Oracle Groundbreaker Yatra ! July 2019..

After Sangem 2018, now it’s time for another OTN Yatra, now renamed to Oracle Groundbreaker Yatra. Sangam is hosted every year and had been either in Bangalore or Hyderabad. It is a One City Event. Oracle Groundbreaker Yatra on the other hand is a Multi-City Tour. I assume, last year it was a 6 City Tour. This year, it is scheduled to be hosted in Mumbai, Chennai, Delhi, Bengaluru, Hyderabad, Kolkata, Pune, Ahmedabad, Visakhapatnam and Thiruvananthapuram.

Registration for the Groundbreaker would open soon. Many prominent speakers would be travelling and it should be a never-miss event for the Oracle Database Community (DBA’s, Developers, Architects, Data Scientists etc). Sandesh Rao, Connor Mcdonald, Roy Swonger and Gurmeet Goindi would be speaking on some interesting topic (as always). Roy Swonger from Oracle US would be visiting India for the first time. Every time you attend the sessions from these experts, you tend to learn something new and I am sure, this time as well, you will come out with bags full of Knowledge.

I am also one of the speakers for the Groundbreaker Yatra and would be travelling to 4 cities, apart from Mumbai (as it is my base location). I have opted for the locations, where I am either travelling for the first time for an User Group event or have visited only once or twice. Now, you can guess it :). I will be in Ahmedabad, Kolkata, Visakhapatnam, Thiruvananthapuram and Mumbai. I assume the agenda for some of the locations is already published and for the other locations, is in the final stage.

Keep a tab on this link Groundbreaker Yatra 2019, so that you do not miss on the registration.

Now, on my session. I am presenting a Session on “Database and Application Performance ! Then and Now.” It is a 1 hour session. I would be covering some of the Performance Challenges the DBA’s or Developers use to face and still face and how these are automatically taken care in Autonomous Databases. I will cover Optimizer, Parallelism, Oracle 19c and walk through some of the topics around Reinforcement Learning and Data Science. Usually, my sessions are supported by live demo’s. But for a 1 hour session, I am not sure whether I will be able to, but shall try to make it more interesting. See you all in July 2019.

Python – Error Installing cx_Oracle on Mac

Just wanted to share a solution that worked for me while resolving an issue installing cx_Oracle on my MacBook. I wanted to connect Python to an Oracle Database. I installed Oracle Instant Client 18c from here. I clicked on “Instant Client for Mac OS X (Intel x86)(32 bit and 64 bit). Then I downloaded following :

- Basic Package - All files required to run OCI, OCCI and JDBC-OCI applications
- The other 2 Downloads under Tools - optional packages.

On Mac, by default these downloads are unzipped. I copied all the files into a Single Folder and updated the .bash_profile with ORACLE_HOME. I made sure that I am able to execute sqlplus. Once done, I started the installation of cx_Oracle. Received the following error :

python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... error
  Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-wheel-5rm6ofqh --python-tag cp36:
  running bdist_wheel
  running build
  running build_ext
  building 'cx_Oracle' extension
  creating build
  creating build/temp.macosx-10.7-x86_64-3.6
  creating build/temp.macosx-10.7-x86_64-3.6/src
  creating build/temp.macosx-10.7-x86_64-3.6/odpi
  creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
  xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for cx-oracle
  Running setup.py clean for cx-oracle
Failed to build cx-oracle
Installing collected packages: cx-oracle
  Running setup.py install for cx-oracle ... error
    Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'cx_Oracle' extension
    creating build
    creating build/temp.macosx-10.7-x86_64-3.6
    creating build/temp.macosx-10.7-x86_64-3.6/src
    creating build/temp.macosx-10.7-x86_64-3.6/odpi
    creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
    xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/

The relevant error here was “xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools)”, which is related to Xcode. To be honest, searched few solutions on google and finally got a solution which was to run download and install Xcode.

xcode-select --install

Just Download, Install Xcode and the installation went smooth. I can now connect python to an Oracle Database. Task accomplished 🙂

Viveks-MacBook-Air:~ vivek$ xcode-select --install
xcode-select: note: install requested for command line developer tools
Viveks-MacBook-Air:~ vivek$ python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... done
  Stored in directory: /Users/vivek/Library/Caches/pip/wheels/5f/b1/97/adf672ad4ab215e227ba939604abccc086e16c91b04cccdcd7
Successfully built cx-oracle
Installing collected packages: cx-oracle
Successfully installed cx-oracle-7.1.2

One important point to remember is that while it says cx_oracle installed, while importing the module in Python Prompt or on Jupyter, it is cx_Oracle.

Autonomous Database ! ADW and ATP. Why do I need both ?

Autonomous Data Warehouse (ADW) was announced in March 2018 and then Autonomous Transaction Processing (ATP) is the latest addition to the Autonomous Database family. ATP is for Mixed Workload and therefore provides 4 Service Names. These are :

HIGH :Highest level of CPU and IO Resources. Concurrency with this Service Name depends upon the number of OCPUs and will scale accordingly.

MEDIUM :Lowest level of CPU and IO Resources. Concurrency depends on the number of CPU’s and will scale accordingly.

LOW :Least amount of CPU and IO Resources. Again, Concurrency depends on the number of CPU’s and the Scaling is directly proportional to the number of OCPU’s.

ADW provides these three Service Names as well. However, unlike ADW (see my previous blog), the definition of these Services are different. In ADW, HIGH provides highest level of Resources but less concurrency, whereas, in ATP, HIGH do provide highest level of Resources but also provides Highest level of Concurrency. Since ATP is designed for Mixed Workload, it provides an additional Service Name PARALLEL. It provides high level of resources but lowest concurrency and is useful for Batch Processes or Heavy Reporting.

Now, many of the technical folks have raised this question on “Why do we need Both? Can’t we use ATP to take care of Data Warehouse kind of a load?”. ATP very well does that by providing an option to run Mixed Workload. However, there are many implementations that are purely Data Warehouse. The problem is that the amount of data that we generate, Warehouse Databases have grown massive in size supporting large amount of complex queries involving many nested joins and expecting sub-second response time. Further, these Databases house data from many different sources. Optimization Strategy for an OLTP and Data Warehouses are completely different. Therefore, an OLTP kind of a setup cannot (in most of the cases) provide the response time needed for DWH Queries.

Row Store v/s Columnar Store

Now, coming back to ATP v/s ADW argument. In an ADW, the data loaded is by default compressed to Oracle Hybrid Columnar Compression, whereas ATP uses a Row Store Format. What this means is that in an ADW, the data stored at disk is in a Columnar format, which is a proven format for speeding up Analytics kind of a Workload where we query few columns. All Data Warehouse Implementations, like Redshift, Google Big Query etc. store data in Columnar format. On the other hand, ATP stores the data in Row format which is ideal for OLTP that demands Single or few row(s) updates or deletes or queries small number of rows.

Since the data in ADW is stored Compressed in Columnar format, it allows scanning of Billions of Rows per CPU cycle. The data for a particular column is stored contiguously making it possible to scan through large number of rows in a single CPU cycles. This technique is called SIMD (Single Instruction Multiple Dataset) processing. To meet a sub-second response time, this technique is plays a significant role.

Such large scans (scanning billion of rows per second) is not possible with Row Store and one of the reasons for this can be found in one of my blog published in 2015 which is still very much relevant. Skipping of Columns consume CPU Cycles which is demonstrated in the blog post that can be found here.

Through this short blog, I wanted to clear the confusion around ADW and ATP. Both have their own use cases and thus are optimized to manage different workloads. Both of these Autonomous Databases have some exciting features. Stay tuned for more on these.

Resource Manager – Autonomous Data Warehouse

I recently presented a Full Day Tech Event on Autonomous databases (ADB) for the North India Chapter of All India Oracle User Group. Since most of the attendees were Database Administrators, it was important to cover technical aspects of ADB. While Oracle doesn’t publicize the Internals of ADB, I spoke about some of my experiences based on many of POC’s.

Many DBA’s still assume that 18c Database is Autonomous? Therefore it is Important to know What is Autonomous Database and What it is not? First thing to note is that 18c is not Autonomous. 18c is just like any other Oracle Database. Autonomous uses the features of Oracle Database 18c, plus 12c, plus 11g, plus 10g and so on. Autonomous Database ensures that all these features work together. So, 18c is one of the Building Blocks of 18c. Autonomous uses all the features that Oracle Developed for almost 2 decades. The other important underlying technology is Oracle Exadata, which is the fastest Oracle Database Machine with redundancy built-in at each layer. Again, Oracle had been working on many features that makes this machine a powerful database machine, in terms of Performance, Reliability and Scalability. The Journey had been long from Smart Scans to Direct-to-wire protocol to Smart Fusion Block Transfer and the enhancements continue. One important thing to note is – Customers do have an option to move to 18c on Exadata within their own Data Centre. Then does it mean, this combination is Autonomous? The answer is NO. The two components are critical, but what makes it Autonomous is the Cloud Automation with Machine Learning.

During the session, I covered the Performance features and the way Oracle controls resource utilization. The topic that came up was Oracle Database Resource Managers. Database Administrators were more inquisitive to know more about this and how it is been implemented in ADB.

Resource Manager-

Resource Managers or Consumer Groups or Service names are the integral part of Autonomous Databases as they manage the workload based on the performance characteristics of the Application. For example, in an Autonomous Data Warehouse there are 3 pre-created Service Names. These are –

HIGH – Highest Resources and Lowest Concurrency. Maximum Queries that can run are 3.

MEDIUM – Less Resources and High Concurrency. Maximum Queries would depend upon the number of OCPU’s.

LOW – Least resources and Highest Concurrency. Maximum Queries – twice the number of OCPU’s and therefore, would depend upon the number of OCPU’s.

Autonomous Transaction Processing (ATP) introduced a new Service – PARALLEL. In the context of ATP, the definition for HIGH, MEDIUM and LOW is different that in ADW. I will cover these in other blog post.

Now, the game becomes more interesting with HIGH and MEDIUM Consumer Groups. While the maximum number of Queries that can run from HIGH is 3 but with concurrency i.e. with concurrent Queries running from MEDIUM and LOW, this number can be less than 3. Same is the case with MEDIUM. With MEDIUM, the maximum numbers would depend on the number of OCPU’s, but actual concurrent queries that can run below the maximum allowed would depend upon the queries running from other consumer groups. For LOW, it is twice the number of OCPU’s and is irrespective of number of queries running from other consumer groups. All these are managed efficiently by pre-configured Resource Managers. Resource Manager, as the name implies, manages the resources. When the load on the system is light, there is very little need for managing the resources. However, as the load increases and system becomes more busy, managing the resources become more important. One of the Primary Goal of Autonomous Database is to ensure consistent performance (always). Therefore, these restrictions. However, as mentioned earlier, other than HIGH, the maximum for other consumer groups depends upon the number of OCPU’s. So, if you have a highly concurrent environment or during peak load with high concurrency, you can scale up more OCPU’s and scale down as well. By the way, Scaling OCPU’s UP and DOWN is Online and is irrespective of the Storage. Let me digress for a moment here. In case of Autonomous Database, if you need additional Storage, you can add it Online and this increase does not mandate adding OCPU’s. Same with the OCPU’s, you don’t need to add Storage. Both are independent. This is not the case with the competition. For example, with AWS Redshift or Microsoft SQL Data Warehouse, since these technologies work on Shared Nothing Architecture, additional Storage means adding more OCPU’s as well. What this means is – Even if you have good amount of CPU resources available and need is to add only the Storage, you will have to add a compute node as well. A storage without a compute is like a DEAD duck. It does nothing on disk unless connected to a Compute Mode.

Coming back to the Resource Manager. As mentioned, resource manager is automatically created and the Application owner has to only make sure that the application queries or the reports are connected to one of the consumer groups based on workload patterns or application requirements.

Why Resource Manager (RM)?

As mentioned earlier, the primary goal of a RM is to Manage System Resources. If resources were unlimited, there would be no need to manage them. Without these, all database connections are considered equal and under heavy load all sessions are impacted equally. There are various components of RM, like Consumer Groups – RM allocate resources to Consumer Groups and not to Sessions, Plan Directives – Resource allocations are actually assigned to Plan Directives and a Consumer group is then assigned a Directive, then finally Resource Plan – a collection of Directives that determines how the resources are to be allocated and managed. Resource Plan is set at the Database or System Level using RESOURCE_MANAGER_PLAN.

In terms of mapping rules, all user sessions (except SYS and SYSTEM) are mapped to a Consumer Group. By default, these are mapped to OTHER_GROUPS, which is a mandatory consumer group. There are 2 types of Session Attributes – Login Attributes and Runtime Attributes. Runtime Attributes is assigned at Runtime and can ne changed anytime during the life cycle of the session. This is basically decided by the application. Login Attributes is tied to the connection and cannot be changed during the lifecycle.

Resource Manager, in ADB uses Login Attribute. Therefore, the application connection is to a particular Service Name has to be pre-decided based on the application requirement.

The maximum number of Queries that can be executed using MEDIUM and LOW Service Names scales linearly. As per Oracle Documentation, for a 16 OCPU Machine, the maximum with MEDIUM is 20 and LOW is 32 (twice the number of OCPU’s). As you scale-up, say to 32 OCPU’s, the maximum for these will Scale as well to 40 for MEDIUM and 64 for LOW.

At present, will restrict this blog post to Resource Manager. Will write on something interesting soon.

Autonomous Database Tech Day ! Gurgaon

 

#Autonomous #AIOUG Presenting a Full Day Event on 8th September 2018 for North India Oracle User Group in Gurgaon. This is on Oracle Autonomous Database. Would be covering some interesting technical capabilities of Autonomous Databases. I am covering the 2 offerings i.e. Autonomous Data Warehouse and Autonomous Transaction Processing. For Registration, click on the following link :

Meraevents Link

This being an Oracle User Group Session, focus would be on the Technical Capabilities of ADW / ATP, like Parallel Processing, Concurrency, Optimizer Enhancements and Behaviour and most importantly, Competition.

So, North India Folks : See you all on 8th September 2018.

Consistent Gets for an Index Scan

This question was raised by a participant during my User Group Session on “Indexing : Fact & Myth Session”. Therefore, I thought of writing about it with the same example that I demonstrated..

I was demonstrating on a Myth that “High Cardinality Column should be a Leading Column of an Index” and for this, I created following table with 2 Indexes.

create table t1 as
select * from all_objects;

exec dbms_stats.gather_table_stats(user,'T1');

SQL> select owner, num_rows, blocks from dba_tables where table_name='T1' and owner='SCOTT';

OWNER                  NUM_ROWS     BLOCKS
-------------------- ---------- ----------
SCOTT                     68605       1377

SQL> select column_name, num_distinct, num_nulls from dba_tab_columns
where   owner='SCOTT'
and     table_name='T1'
and     column_name in ('OBJECT_ID','TEMPORARY')
order by 1;

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS
------------------------------ ------------ ----------
OBJECT_ID                             68605          0
TEMPORARY                                 2          0

SQL> create index t1_ot on t1(object_id, temporary);

Index created.

SQL> create index t1_to on t1(temporary,object_id);

Index created.

SQL> select index_name, blevel, leaf_blocks, clustering_factor from dba_indexes
where   table_name='T1'
order by 1;

INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
T1_OT                                   1         171              1458
T1_TO                                   1         171              1494

So, I have a table with 68605 rows. Object_ID is 100% Distinct and Temporary has 2 Distinct Values. I have 2 indexes on it, which are on object_id & temporary. T1_OT is on (Object_ID, Temporary) and T1_TO is on (Temporary, Object_ID). The naming convention stands for the first letter of the columns in the order they are in the Index. So, for T1_OT O->Object_id and T->Temporary.

The Index Statistics shows that the two indexes are almost same, in terms of Height (BLEVEL), number of Leaf Blocks. A minor different in the Clustering_factor though.

I than executed the queries to demonstrate that the I/O’s done by queries using any of the 2 indexes is exactly same. For this, I executed the query and it used Index T1_OT and and then hinted the query to make use of T1_TO Index. The Cost and IO’s for both the queries are exactly same, which leads to a conclusion that cardinality doesn’t matter when designing a Index Strategy. Index Columns should be based on Application Queries and the Columns. The queries were executed twice to ensure that the consistent gets that we are post the hard parsing.


## IO's for the Query using an Index T1_OT

select owner, object_name from t1
where      object_id = 58
and        temporary='N';

OWNER                OBJECT_NAME
-------------------- ------------------------------
SYS                  I_CCOL2

set autot trace
select owner, object_name from t1
where      object_id = 58
and        temporary='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 3109227855

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    48 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    48 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_OT |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=58 AND "TEMPORARY"='N')

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets 
          0  physical reads
          0  redo size
...
...
          1  rows processed

## IO's with Index T1_TO

SQL> select /*+ index(t1,t1_to) */ owner, object_name from t1
     where      object_id = 58
     and        temporary='N';

OWNER                OBJECT_NAME
-------------------- ------------------------------
SYS                  I_CCOL2

Elapsed: 00:00:00.00
SQL> pause;

set autot trace
select /*+ index(t1,t1_to) */ owner, object_name from t1
where      object_id = 58
and        temporary='N';


Execution Plan
----------------------------------------------------------
Plan hash value: 1129381402

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |    48 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    48 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_TO |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"='N' AND "OBJECT_ID"=58)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
...
...
          1  rows processed

While I have 2 Indexes on the same columns only the ordered changed, optimizer chose an Index on Object_ID and Temporary. As my sessions are interactive (and this sometimes mean that my sessions take more time than alloted :)), I asked the participants the reason behind this optimizer decision and there were lot many assumptions. Anju Garg came out with the correct guess. However, will disclose this later.

At this point, the question raised was, why there are 4 Consistent I/O’s? The assumption here was that it should be 3 (BLEVEL + LEAF BLOCK Access + Table Block).

Next, I dropped any one of the Index and re-created it as a Unique Index. Remember, Object_ID is 100% Distinct. So, I will drop and re-create T1_OT.

SQL> drop index t1_ot;

Index dropped.

Elapsed: 00:00:00.04
SQL> create unique index t1_ot_uq on t1(object_id, temporary);

Index created.

SQL> select index_name, blevel, leaf_blocks, clustering_factor, uniqueness from dba_indexes
where   table_name='T1'
order by 1;
  2    3
INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR UNIQUENES
------------------------------ ---------- ----------- ----------------- ---------
T1_OT_UQ                                1         162              1457 UNIQUE
T1_TO                                   1         171              1493 NONUNIQUE

I will then execute the queries to check for the consistent gets. This time, the consistent gets for the execution plan with Unique Index is 3 (as against 4 for the same non-unique index).

select owner, object_name from t1
where      object_id = 58
and        temporary='N';

Execution Plan
----------------------------------------------------------
Plan hash value: 1959391432

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    48 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    48 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_OT_UQ |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=58 AND "TEMPORARY"='N')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

There is a difference in the Consistent Gets with Unique and Non-Unique Index. The height of both these Indexes are exactly same. This difference was important to get to original question. So, I generated a 10046 trace for the 2 Queries (with Unique and Without Unique Scan) and the relevant portion from the trace is as under, which will explain the reason behind 4 Consistent Gets.

## 10046 portion for Non-Unique index. Please see the BOLD and UNDERLINED portion. The extra cr=1 for FETCH#18446604434610142176.
## In this case, once the Blocks are fetched from an Index (cr=2) and Table (cr=1) Total cr=3, there is an extra cr immediately after 
## SQL*Net message to client. So, the total cr = 2 + 1 + 1 (extra) = 4.

PARSE #18446604434610142176:c=118,e=119,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1259244381,tim=7093069466
EXEC #18446604434610142176:c=88,e=93,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1259244381,tim=7093070117
WAIT #18446604434610142176: nam='SQL*Net message to client' ela= 8 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=7093070421
WAIT #18446604434610142176: nam='db file scattered read' ela= 2292 file#=12 block#=275312 blocks=8 obj#=77545 tim=7093073241
WAIT #18446604434610142176: nam='db file sequential read' ela= 49 file#=12 block#=255875 blocks=1 obj#=77543 tim=7093073719
FETCH #18446604434610142176:c=2088,e=3043,p=9,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=1259244381,tim=7093073891
WAIT #18446604434610142176: nam='SQL*Net message from client' ela= 624 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7093074878
FETCH #18446604434610142176:c=104,e=104,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,plh=1259244381,tim=7093075164
STAT #18446604434610142176 id=1 cnt=1 pid=0 pos=1 obj=77543 op='TABLE ACCESS BY INDEX ROWID T1 (cr=4 pr=9 pw=0 str=1 time=3109 us cost=2 size=48 card=1)'
STAT #18446604434610142176 id=2 cnt=1 pid=1 pos=1 obj=77545 op='INDEX RANGE SCAN T1_TO (cr=3 pr=8 pw=0 str=1 time=2835 us cost=1 size=0 card=1)'
WAIT #18446604434610142176: nam='SQL*Net message to client' ela= 6 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7093076331

## 10046 portion for Unique index. In this case, once the Blocks are fetched from an Index (cr=2) and Table (cr=1), there is no extra cr.

PARSE #18446604434610123376:c=134,e=135,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1959391432,tim=7120639467
EXEC #18446604434610123376:c=89,e=90,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1959391432,tim=7120640009
WAIT #18446604434610123376: nam='SQL*Net message to client' ela= 8 driver id=1413697536 #bytes=1 p3=0 obj#=0 tim=7120640248
WAIT #18446604434610123376: nam='db file scattered read' ela= 5875 file#=12 block#=260496 blocks=8 obj#=77546 tim=7120646397
WAIT #18446604434610123376: nam='db file sequential read' ela= 64 file#=12 block#=255875 blocks=1 obj#=77543 tim=7120646786
FETCH #18446604434610123376:c=1407,e=6569,p=9,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=1959391432,tim=7120646947
STAT #18446604434610123376 id=1 cnt=1 pid=0 pos=1 obj=77543 op='TABLE ACCESS BY INDEX ROWID T1 (cr=3 pr=9 pw=0 str=1 time=6549 us cost=2 size=48 card=1)'
STAT #18446604434610123376 id=2 cnt=1 pid=1 pos=1 obj=77546 op='INDEX UNIQUE SCAN T1_OT_UQ (cr=2 pr=8 pw=0 str=1 time=6240 us cost=1 size=0 card=1)'
WAIT #18446604434610123376: nam='SQL*Net message from client' ela= 583 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7120652705
FETCH #18446604434610123376:c=15,e=15,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=1959391432,tim=7120652925
WAIT #18446604434610123376: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=77543 tim=7120653074

From this, I can assume that the steps carried out for a Non-Unique Index is as under (as it an Index-Range Scan).

-	Read the Root Block to get the address of the Leaf Block (IO = 1). 
- 	Read the Leaf Block to check for the matching Object_ID and Temporary Column. Get the ROWID for the table block. (IO = 2).
-	Go to the Table Block to read the other required columns listed in the SELECT list. (IO=3).
-	Go back to the Index Block to check for any other Object_ID and Temporary Values. (IO=4).
-	It is here that it gets to know that there are no more rows.

Bullet Point 4 is not required for a Unique Scan as Oracle is aware that it is a Unique Index and therefore, there will be no additional read required. Further, to confirm this, I executed a query on OBJECT_ID using an Unique Index. Remember, while Object_Id is 100% Unique, but the Unique Index is on the 2 columns and I am not referring the 2nd column in the query, which will change the plan from UNIQUE SCAN to RANGE SCAN.

select owner, object_name from t1
where      object_id = 58;

Execution Plan
----------------------------------------------------------
Plan hash value: 1834913555

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |    46 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     1 |    46 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_OT_UQ |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=58)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets 

So, this answers the question raised during the session on the rationale behind 4 Consistent Reads. Unique and Non-Unique Indexes are internally same with only a difference in the way these are accessed. With Index Range Scan, the behaviour of both the indexes are exactly same.

Now, for the another question on why the optimizer used T1_OT (by default) and not T1_TO ? The reason was a TIE between the 2 indexes, which is usually very rare with Cost Based Optimization and due to the TIE, optimizer preferred an Index in an alphabetical order. TI_OT comes before TI_TO. To demonstrate this, I dropped and recreated T1_TO as T1_1TO and optimized started using T1_1TO by default.

Cloud Day : North India User Group Chapter

Presenting multiple sessions during Cloud Day scheduled in Gurgaon on 11th November 2017. Will be speaking on “Oracle IaaS and PaaS”, “Oracle In-Memory Database” and “Indexes : Myths and Misconceptions”. For registration, click on :

https://www.meraevents.com/event/aioug-nic-cloud-day

Performance Tuning Day – Mumbai Chapter

Presenting a Full Day session on Performance Optimisation for Mumbai Chapter of All India Oracle User Group. This is on 5th August 2017. For registration, click on this Performance Tuning Day. I have revised the content of this session, to incorporate the latest Optimiser behaviour and changes.

See you all there…

Sangam 2016

Sangam 2016, an Annual Event organized by All India Oracle User Group is just a 35 days from now. It is scheduled for 11th and 12th November in Bangalore. A chance to meet and listen to World renowned speakers from all over the world. Visit Sangam 2016 for more details about the event, the speakers, their sessions etc.

I am presenting 2 back to back sessions on Cost Based Optimizer. My session is on 12th November 2016. Hope to see you all in large numbers :). I will also be hosting an In-Memory Demo Booth, along with the Product Management Team from US.

i_am_speaking

%d bloggers like this: