Database Vault ! Autonomous Database

Availability of Database Vault on Autonomous Database (Shared Infrastructure) was announced in February 2020. Database Vault does not require any introduction as this feature has been available since Oracle 10g. As a brief, it provides powerful security controls to help protect sensitive application data from Unauthorised access and comply with privacy and regulatory requirements. This means, a superuser (or a user with DBA privilege) can access the sensitive application data. Oracle Database Vault implements this security at the database layer.

Database Vault provides additional security to the Autonomous Database Family and is just few steps. This blog demonstrates the steps needed to implement Database Vault on Autonomous Data Warehouse. Before I go through the steps, it is important to know a concept of REALMS. Realm is a grouping of database schemas, database objects and roles that must be secured for a given application. It is a kind of a Zone of Protection for your database objects. So, one of the steps required will be to create a Realm.

Every Autonomous Database has an ADMIN user, which is a super user. ADMIN user can query any table within the Database (by default). All the steps required for implementing Database Vault is to be executed using ADMIN user. Once done, ADMIN user will continue to be a super user, but will not be able to query the application data. In my case, assume VIVEK is an application schema containing all the tables accessed by my application and one such table is CREDIT_CARD.


- Connect to ADMIN User and Query the CREDIT_CARD Table. ADMIN is able to query the data.

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Check whether Data Vault is Configured and Enabled. It is not configured.

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS FALSE
DV_ENABLE_STATUS    FALSE

- Create Database Vault Owner and Database Vault Account Manager User.

CREATE USER ADB_DBV_OWNER IDENTIFIED BY "xxxxxxxxxxxxx";
CREATE USER ADB_DBV_ACCTMGR IDENTIFIED BY "xxxxxxxxxxxxx";
GRANT CREATE SESSION, SELECT ANY TABLE TO ADB_DBV_OWNER;

- Check whether we can select the data from CREDIT_CARD table, before implementing Database Vault
- Database Audit Vault User has SELECT ANY TABLE privilege.

CONNECT ADB_DBV_OWNER/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00


- Configure and Enable Database Vault

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low


EXEC DBMS_CLOUD_MACADM.CONFIGURE_DATABASE_VAULT('ADB_DBV_OWNER', 'ADB_DBV_ACCTMGR');

- Database Vault Configured, but not Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    FALSE

- Enable Database Vault. For this to come into effect, we need to restart the database

EXEC DBMS_CLOUD_MACADM.ENABLE_DATABASE_VAULT;

- Restart the Database. Check after Database Restart
- Database Vault Configured and Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    TRUE

- Admin can still Query the Data

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Create REALM, Add Object to the Realm 

BEGIN
 DBMS_MACADM.CREATE_REALM(
  realm_name    => 'Vivek Application Schema', 
  description   => 'Realm created for Vivek Schema', 
  enabled       => DBMS_MACUTL.G_YES, 
  audit_options => DBMS_MACUTL.G_REALM_AUDIT_FAIL + DBMS_MACUTL.G_REALM_AUDIT_SUCCESS,
  realm_type    => 1,
  realm_scope   => DBMS_MACUTL.G_SCOPE_LOCAL);
END; 
/

BEGIN
 DBMS_MACADM.ADD_OBJECT_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  object_owner => 'VIVEK', 
  object_name  => '%', 
  object_type  => '%'); 
END;
/

- Add Authorised User, which is the User that can Query the Application Table
- In this case, VIVEK is the application owner

BEGIN
 DBMS_MACADM.ADD_AUTH_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  grantee      => 'VIVEK', 
  auth_options => DBMS_MACUTL.G_REALM_AUTH_OWNER);
END;
/

- Check whether ADMIN can Query the sensitive Data from Vivek Schema

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;
                    *
ERROR at line 1:
ORA-01031: insufficient privileges

- Vivek being an Application Owner, can Query the data. No other Users allowed to Query

CONNECT VIVEK/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

As can be seen, the steps are pretty simple. This is a simplified version of Database Vault configuration. Refer to Oracle Documentation on the various configuration options available with Database Vault. Hope this helps.

DBMS_CLOUD : Loading a TEXT file without any DELIMITER

Recently I was working on an issue related to loading data from a flat file on to Autonomous Data Warehouse using dbms_cloud api. While lot has been written on this subject, I thought of writing on this as the challenge here was that the file in this case was without any delimiter. The Original data to be loaded was around 90 Million, but for testing purpose customer was loading around 200k rows.

What is a Delimiter? Delimiter is one or more character that separates two strings. These characters are used to identify different columns.

One of the best blog on this topic is this blog from Nilay Panchal.

Coming back to my topic of this blog. In this case, customer was trying to load 200k rows into a table and the way they were using it, the time it took to load these 200k rows was around 25 minutes which is too huge considering that actual production number of rows would be 90 Million.

As mentioned, the challenge here was that the content in this flat file had no delimiter or any character to identify column break. The original flat file had around 500 columns, but for testing the file was truncated to 5 columns. The idea here is to demonstrate an efficient way of loading the data into a table when the data in the flat file is without any delimiter. The sample data is as under:

CCODEDEBUTRUNSHUNDREDSPLAYER_NAME
IND151119891592151SACHIN TENDULKAR
AUS081219951337841   RICKY PONTING
 SA141219951328945  JACQUES KALLIS
 SL200720001240038 KUMAR SANGAKARA
 WI061219901195334      BRIAN LARA
IND20062011 720227     VIRAT KOHLI

Our very first step will be to create cloud object storage credentials using dbms_cloud.create_credentials. If you have already created a credential, then you can skip this. Check dba_credentials for the list of Credentials.

begin
  dbms_cloud.create_credential(
    credential_name=>'VIVEK_CREDENTIAL',
    username=>'VIVEKS',
    password=>'paste_auth_token_here'
  );
end;
/

SQL> column owner for a20
SQL> column credential_name for a30
SQL> column username for a20
SQL> select owner, credential_name, username from dba_credentials;

OWNER		     CREDENTIAL_NAME		    USERNAME
-------------------- ------------------------------ --------------------
ADMIN		     VIVEK_CREDENTIAL		    VIVEKS

I have uploaded by data.txt file that contains the required data to be loaded on the Object Storage. Next Step will be create a table in which the data is to be loaded. For COPY_DATA, you need to ensure the Target Table is already created as the procedure would fail.

CREATE TABLE CRICKETER_DATA
   (CCODE VARCHAR2(3) ,
   DEBUT DATE,
   RUNS  NUMBER(5),
   HUNDREDS NUMBER(2),
   PLAYER_NAME VARCHAR2(16));

Now, I will create by COPY_DATA procedure with the required inputs. It is important to note the format and field_list option. In my case, the date was in DDMMYYYY format and hence I had to specify the dateformat explicitly to allow copy_data to parse the date accurately. Without the dateformat value, my copy_data failed with an error. The other parameter is field_list, which ensures that in the absence of delimiter, the copy_data parses the column values based on the start and end position. You need to ensure that these position are accurate.

begin
dbms_cloud.copy_data(
    table_name =>'CRICKETER_DATA',
    credential_name =>'VIVEK_CREDENTIAL',
    file_uri_list =>'&MY_OBJECT_STORAGE_URL/data.txt',
    format => json_object('skipheaders' value '1', 'dateformat' value 'ddmmyyyy'),
    field_list =>'CCODE POSITION(1:3), DEBUT POSITION(4:11), RUNS POSITION(12:16), HUNDREDS POSITION(17:18), PLAYER_NAME POSITION(19:34)');
end;
/

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.50
SQL> SQL> select * from cricketer_data;

CCO DEBUT		       RUNS   HUNDREDS PLAYER_NAME
--- -------------------- ---------- ---------- ----------------
IND 15-NOV-1989 00:00:00      15921	    51 SACHIN TENDULKAR
AUS 08-DEC-1995 00:00:00      13378	    41	  RICKY PONTING
 SA 14-DEC-1995 00:00:00      13289	    45	 JACQUES KALLIS
 SL 20-JUL-2000 00:00:00      12400	    38	KUMAR SANGAKARA
 WI 06-DEC-1990 00:00:00      11953	    34	     BRIAN LARA
IND 20-JUN-2011 00:00:00       7202	    27	    VIRAT KOHLI

6 rows selected.

If for some reason the copy_data procedure fails, it creates a error log table, which is reported immediately along with the error number. To drill down to the actual error, you need to query the log table. For example, when I omitted FORMAT option from my COPY_DATA procedure, it failed with following error. I queried the COPY$22_LOG to extract the actual error.


select * from COPY$21_LOG;

RECORD
------------------------------------------------------------------------------------------------------------------------------------
 LOG file opened at 01/28/20 05:15:18

Total Number of Files=1

Data File: &MY_OBJECT_STORAGE_URL/data.txt

Log File: COPY$21_331103.log

 LOG file opened at 01/28/20 05:15:18

Bad File: COPY$21_331103.bad

Field Definitions for table COPY$Q18IZ07MUUMYRU3IG6MU
  Record format DELIMITED BY
  Data in file has same endianness as the platform
  Rows with all null fields are accepted

  Fields in Data Source:

    CCODE			    CHAR (3)
      Record position (1, 3)
      Terminated by "|"
    DEBUT			    CHAR (8)
      Record position (4, 11)
      Terminated by "|"
    RUNS			    CHAR (5)
      Record position (12, 16)
      Terminated by "|"
    HUNDREDS			    CHAR (2)
      Record position (17, 18)
      Terminated by "|"
    PLAYER_NAME 		    CHAR (16)
      Record position (19, 34)
      Terminated by "|"
error processing column DEBUT in row 1 for datafile &MY_OBJECT_STORAGE_URL/data.txt

ORA-01858: a non-numeric character was found where a numeric was expected

40 rows selected.

Oracle Database ! Multi-Model Capabilities.

Everything revolves around DATA. One of the challenges for an Organization is to quickly adapt to the ever changing Business User Needs. These needs are driven by Competition and Business Users Responsible for the Company’s Growth at the same time remain Competitive.

Data Arrives in different format – Structured and Unstructured. It is important to understand the 3V’s of Big Data Technologies. Data arrives in different format (Variety) as these are generated from different sources. Data generation is not restricted to a Single Device, like Laptops or Desktops. These are generated from different devices 24×7. This means Volume has gone up tremendously. And since, it is generated 24×7, the Speed (Velocity) with which the data arrives, gives less time for the Business User to analyse and derive some meaningful insights from this data. Less time, because Data that arrives in different format is either Transformed into Structured format and is loaded into a relational database that enforces data modelled by a Schema (Schema on Write) or is kept in an Unstructured Format and queried using some other database models (NoSQL, NewSQL or Big Data). This means, IT has to provide different means for connecting to these disparate systems which introduces Complexity. Integration poses another challenge. Combining these data for faster and nearly accurate analysis is another Big Challenge. Therefore, it has now become more critical for a CIO or an IT Department to come up with an Innovative Solutions to complex processes and problems revolving around disparate systems storing different formats of DATA.

POLYGLOT Persistance

It is a concept of using different data storage technologies to handle different data storage needs within a given Software application. Every Application Vendor or an ISV or System Implementers provide best database to address their application demand/requirements but no one company has a Single Application. Each Application caters to different use cases with different data models thus different data storage. Integrating these to provide a Unified View is a biggest challenge.

Multi-Model Databases provide an Organization the required Agility. The flexibility to use different data models, to address different use cases, to store Variety of Data for the Business Users within a Single Integrated Backend Database. This addresses the complexity challenge introduced by Polyglot Persistance.

SIGNIFICANT Recognition

One of the Significant Recognition for Oracle Database is that it scored highest in both the current offerings and strategy categories by Forester. This is the ability to support converged OLTP and Analytical (Translytical) Workloads using Database In-Memory, Exadata and extensive Multi-Model Capabilities.

It’s a privilige to be a part of such an Innovative Technology and, Yes – to be a part of an Innovative Organization.

Link to Forester Report

Oracle Groundbreaker Yatra ! July 2019..

After Sangem 2018, now it’s time for another OTN Yatra, now renamed to Oracle Groundbreaker Yatra. Sangam is hosted every year and had been either in Bangalore or Hyderabad. It is a One City Event. Oracle Groundbreaker Yatra on the other hand is a Multi-City Tour. I assume, last year it was a 6 City Tour. This year, it is scheduled to be hosted in Mumbai, Chennai, Delhi, Bengaluru, Hyderabad, Kolkata, Pune, Ahmedabad, Visakhapatnam and Thiruvananthapuram.

Registration for the Groundbreaker would open soon. Many prominent speakers would be travelling and it should be a never-miss event for the Oracle Database Community (DBA’s, Developers, Architects, Data Scientists etc). Sandesh Rao, Connor Mcdonald, Roy Swonger and Gurmeet Goindi would be speaking on some interesting topic (as always). Roy Swonger from Oracle US would be visiting India for the first time. Every time you attend the sessions from these experts, you tend to learn something new and I am sure, this time as well, you will come out with bags full of Knowledge.

I am also one of the speakers for the Groundbreaker Yatra and would be travelling to 4 cities, apart from Mumbai (as it is my base location). I have opted for the locations, where I am either travelling for the first time for an User Group event or have visited only once or twice. Now, you can guess it :). I will be in Ahmedabad, Kolkata, Visakhapatnam, Thiruvananthapuram and Mumbai. I assume the agenda for some of the locations is already published and for the other locations, is in the final stage.

Keep a tab on this link Groundbreaker Yatra 2019, so that you do not miss on the registration.

Now, on my session. I am presenting a Session on “Database and Application Performance ! Then and Now.” It is a 1 hour session. I would be covering some of the Performance Challenges the DBA’s or Developers use to face and still face and how these are automatically taken care in Autonomous Databases. I will cover Optimizer, Parallelism, Oracle 19c and walk through some of the topics around Reinforcement Learning and Data Science. Usually, my sessions are supported by live demo’s. But for a 1 hour session, I am not sure whether I will be able to, but shall try to make it more interesting. See you all in July 2019.

Python – Error Installing cx_Oracle on Mac

Just wanted to share a solution that worked for me while resolving an issue installing cx_Oracle on my MacBook. I wanted to connect Python to an Oracle Database. I installed Oracle Instant Client 18c from here. I clicked on “Instant Client for Mac OS X (Intel x86)(32 bit and 64 bit). Then I downloaded following :

- Basic Package - All files required to run OCI, OCCI and JDBC-OCI applications
- The other 2 Downloads under Tools - optional packages.

On Mac, by default these downloads are unzipped. I copied all the files into a Single Folder and updated the .bash_profile with ORACLE_HOME. I made sure that I am able to execute sqlplus. Once done, I started the installation of cx_Oracle. Received the following error :

python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... error
  Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-wheel-5rm6ofqh --python-tag cp36:
  running bdist_wheel
  running build
  running build_ext
  building 'cx_Oracle' extension
  creating build
  creating build/temp.macosx-10.7-x86_64-3.6
  creating build/temp.macosx-10.7-x86_64-3.6/src
  creating build/temp.macosx-10.7-x86_64-3.6/odpi
  creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
  xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for cx-oracle
  Running setup.py clean for cx-oracle
Failed to build cx-oracle
Installing collected packages: cx-oracle
  Running setup.py install for cx-oracle ... error
    Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'cx_Oracle' extension
    creating build
    creating build/temp.macosx-10.7-x86_64-3.6
    creating build/temp.macosx-10.7-x86_64-3.6/src
    creating build/temp.macosx-10.7-x86_64-3.6/odpi
    creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
    xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/

The relevant error here was “xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools)”, which is related to Xcode. To be honest, searched few solutions on google and finally got a solution which was to run download and install Xcode.

xcode-select --install

Just Download, Install Xcode and the installation went smooth. I can now connect python to an Oracle Database. Task accomplished 🙂

Viveks-MacBook-Air:~ vivek$ xcode-select --install
xcode-select: note: install requested for command line developer tools
Viveks-MacBook-Air:~ vivek$ python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... done
  Stored in directory: /Users/vivek/Library/Caches/pip/wheels/5f/b1/97/adf672ad4ab215e227ba939604abccc086e16c91b04cccdcd7
Successfully built cx-oracle
Installing collected packages: cx-oracle
Successfully installed cx-oracle-7.1.2

One important point to remember is that while it says cx_oracle installed, while importing the module in Python Prompt or on Jupyter, it is cx_Oracle.

All India Oracle User Group Events for 2018

Sangam 2018 is scheduled for 7th and 8th December 2018. I am presenting a session on “SQL Tuning ! Tips, Tools and Techniques” which is on 7th December 2018 and is immediately after Lunch. A 45 minute session doesn’t justify the topic, but will try my level best to cover few interesting real life examples. On 8th December, I will be co-hosting a Fire side chat on Performance along side Tirthankar Lahiri (VP of Product Management, Oracle US), Arup Nanda, GP Gongloor and Karan Dodwal. This is scheduled for 2.55 pm. The only issue is that I may have to leave early to catch my flight back to Mumbai.

North India Chapter – Chandigarh. Mark your calendar for 24th November 2018 as I will be speaking for the entire day on Performance Optimization covering some interesting topics around SQL Optimization, Optimizer, Indexes, Autonomous Databases and many more. This is a Full day event which gives me good amount of time to demonstrate some of the interesting facts. Registration for this event is open. Use the following link to register. See you soon.

https://www.meraevents.com/us/previewevent?view=preview&eventId=183767

For 2019, will publish the calendar, once it is freezed 🙂

Importance of Constraints ! Know your Reporting Tools

Recently, I was working on a customer issue. The issue was a performance comparison between a Competition and an Oracle Database. The performance of Competition was reported to be better than Oracle. Now, this is a classic case of Bad Schema Design and a Badly Written Query. Working on this issue reminded me of a great learning that I had after reading “Expert Oracle Database Architecture” by Thomas Kyte. He wrote about a classic issue with a Pl/SQL Block running in SQL Server and generating wrong results when ported to Oracle Database due to the way these 2 databases compare NULL values. Each of these databases are different. It also reminded me of one of my response to a query from a customer on “A count(*) from a Table is doing a Full Table Scan, even though it has a Unique Index on it”.

As Tom mentioned in his book, every database is different and implement the features differently. Now, the third party tools that connect to each of these different data sources generate queries that syntactically work on all but may not run optimally. Therefore, it is important to understand our Data and design the Schema with all the required constraints and indexes in place.

In this case, customer was running few analytical reports from a Third Party Reporting Tool (Tableau). Customer selected few columns with couple of predicates with Date Range and some product name. The Queries were around 40-50 Lines. Surprisingly, for each of the Predicates, Tableau added few additional predicates and due to these additional predicates, the run time plan was around 2700+ lines long. As an example, I have a table T1 and have to generate a report selecting object id’s and names for any of the 2 TEMPORARY values (Y or N). My query will look like :

select object_id, object_name from t1 where temporary=:b1;

If I have an Index on TEMPORARY Column, Optimizer will come out with an optimal plan based on it’s cost calculation. However, when run from Tableau, it came out with the following query:

select object_id, object_name from t1
where ((temporary=:b1) or (temporary is NULL and :b1 is NULL));

What will be the implication of this added OR conditions ? Let’s see.

SQL> create table t1 as
select * from all_objects;  2

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'T1', method_opt=>'for all columns size auto for columns temporary size 100');

PL/SQL procedure successfully completed.

SQL> create index t1_idx on t1(temporary);

Index created.

SQL> select num_rows, blocks from dba_tables where table_name='T1';

  NUM_ROWS     BLOCKS
---------- ----------
     68417       1375

SQL> variable l_temporary varchar2(1);
SQL> exec :l_temporary:='Y';

PL/SQL procedure successfully completed.

SQL> select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL)); 

161 rows selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  7xsahyu4q594y, child number 0
-------------------------------------
select object_id, object_name from t1 where ((temporary=:l_temporary)
or (temporary is NULL and :l_temporary is NULL))

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   375 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   161 |  6923 |   375   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEMPORARY"=:L_TEMPORARY OR ("TEMPORARY" IS NULL AND
              :L_TEMPORARY IS NULL)))

161 Rows out of 68000, which is less than 1%. Index Access would have been a better option. At customer end, the table was huge with Billions of Rows and with multiple such OR Predicates, the very first observation was the amount of time the Optimizer took to Parse the Query. The Query took around 184 Seconds to run and the observation was that out of 184 Seconds, around 175 Seconds were spent on Parsing. This was identified as a BUG in 18c (BUG#28725660) and the primary cause identified as the change in OR Expansion behaviour in 18c. This BUG is fixed in 19c. Backporting it to 18c would have taken some time, so the other fix that we applied was to add NOT NULL Constraints to some of the columns. From the Data, we could see that none of the columns had NULL values. We checked with the developers and they mentioned that NULL values are not stored in this column. Therefore, it was safe to add these constraints. Continuing with our example above, let’s add a NOT NULL constraint to our Table T1.

SQL> alter table t1 modify temporary not null;

Table altered.

select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL));

select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |       |     5 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |   161 |  6923 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |   161 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"=:L_TEMPORARY)

The run time plan of the same query is now an Index Scan and is much better than previous. This additional input to the Optimizer was enough to transform the query. You can generate 10053 trace to see the transformation.

NOT NULL constraint combined with Unique Index is required to answer a COUNT(*) query from an Index, which one of my customer asked long time back. This is another case, where NOT NULL constraint helped optimizer come up with an optimal plan.

It is important to optimize the Schema and provide all the critical inputs to the Optimizer. Generic reporting tools would come out with queries that work on all the databases. These generated queries may work on some database and may not work on other. Therefore, the title of this blog “Know your Reporting Tool” :).

On the real customer issue, post adding the constraints to few of the columns, the query response time reduced drastically from 184 Seconds to less than 10 seconds far better than the competition.

Autonomous Database ! ADW and ATP. Why do I need both ?

Autonomous Data Warehouse (ADW) was announced in March 2018 and then Autonomous Transaction Processing (ATP) is the latest addition to the Autonomous Database family. ATP is for Mixed Workload and therefore provides 4 Service Names. These are :

HIGH :Highest level of CPU and IO Resources. Concurrency with this Service Name depends upon the number of OCPUs and will scale accordingly.

MEDIUM :Lowest level of CPU and IO Resources. Concurrency depends on the number of CPU’s and will scale accordingly.

LOW :Least amount of CPU and IO Resources. Again, Concurrency depends on the number of CPU’s and the Scaling is directly proportional to the number of OCPU’s.

ADW provides these three Service Names as well. However, unlike ADW (see my previous blog), the definition of these Services are different. In ADW, HIGH provides highest level of Resources but less concurrency, whereas, in ATP, HIGH do provide highest level of Resources but also provides Highest level of Concurrency. Since ATP is designed for Mixed Workload, it provides an additional Service Name PARALLEL. It provides high level of resources but lowest concurrency and is useful for Batch Processes or Heavy Reporting.

Now, many of the technical folks have raised this question on “Why do we need Both? Can’t we use ATP to take care of Data Warehouse kind of a load?”. ATP very well does that by providing an option to run Mixed Workload. However, there are many implementations that are purely Data Warehouse. The problem is that the amount of data that we generate, Warehouse Databases have grown massive in size supporting large amount of complex queries involving many nested joins and expecting sub-second response time. Further, these Databases house data from many different sources. Optimization Strategy for an OLTP and Data Warehouses are completely different. Therefore, an OLTP kind of a setup cannot (in most of the cases) provide the response time needed for DWH Queries.

Row Store v/s Columnar Store

Now, coming back to ATP v/s ADW argument. In an ADW, the data loaded is by default compressed to Oracle Hybrid Columnar Compression, whereas ATP uses a Row Store Format. What this means is that in an ADW, the data stored at disk is in a Columnar format, which is a proven format for speeding up Analytics kind of a Workload where we query few columns. All Data Warehouse Implementations, like Redshift, Google Big Query etc. store data in Columnar format. On the other hand, ATP stores the data in Row format which is ideal for OLTP that demands Single or few row(s) updates or deletes or queries small number of rows.

Since the data in ADW is stored Compressed in Columnar format, it allows scanning of Billions of Rows per CPU cycle. The data for a particular column is stored contiguously making it possible to scan through large number of rows in a single CPU cycles. This technique is called SIMD (Single Instruction Multiple Dataset) processing. To meet a sub-second response time, this technique is plays a significant role.

Such large scans (scanning billion of rows per second) is not possible with Row Store and one of the reasons for this can be found in one of my blog published in 2015 which is still very much relevant. Skipping of Columns consume CPU Cycles which is demonstrated in the blog post that can be found here.

Through this short blog, I wanted to clear the confusion around ADW and ATP. Both have their own use cases and thus are optimized to manage different workloads. Both of these Autonomous Databases have some exciting features. Stay tuned for more on these.

Resource Manager – Autonomous Data Warehouse

I recently presented a Full Day Tech Event on Autonomous databases (ADB) for the North India Chapter of All India Oracle User Group. Since most of the attendees were Database Administrators, it was important to cover technical aspects of ADB. While Oracle doesn’t publicize the Internals of ADB, I spoke about some of my experiences based on many of POC’s.

Many DBA’s still assume that 18c Database is Autonomous? Therefore it is Important to know What is Autonomous Database and What it is not? First thing to note is that 18c is not Autonomous. 18c is just like any other Oracle Database. Autonomous uses the features of Oracle Database 18c, plus 12c, plus 11g, plus 10g and so on. Autonomous Database ensures that all these features work together. So, 18c is one of the Building Blocks of 18c. Autonomous uses all the features that Oracle Developed for almost 2 decades. The other important underlying technology is Oracle Exadata, which is the fastest Oracle Database Machine with redundancy built-in at each layer. Again, Oracle had been working on many features that makes this machine a powerful database machine, in terms of Performance, Reliability and Scalability. The Journey had been long from Smart Scans to Direct-to-wire protocol to Smart Fusion Block Transfer and the enhancements continue. One important thing to note is – Customers do have an option to move to 18c on Exadata within their own Data Centre. Then does it mean, this combination is Autonomous? The answer is NO. The two components are critical, but what makes it Autonomous is the Cloud Automation with Machine Learning.

During the session, I covered the Performance features and the way Oracle controls resource utilization. The topic that came up was Oracle Database Resource Managers. Database Administrators were more inquisitive to know more about this and how it is been implemented in ADB.

Resource Manager-

Resource Managers or Consumer Groups or Service names are the integral part of Autonomous Databases as they manage the workload based on the performance characteristics of the Application. For example, in an Autonomous Data Warehouse there are 3 pre-created Service Names. These are –

HIGH – Highest Resources and Lowest Concurrency. Maximum Queries that can run are 3.

MEDIUM – Less Resources and High Concurrency. Maximum Queries would depend upon the number of OCPU’s.

LOW – Least resources and Highest Concurrency. Maximum Queries – twice the number of OCPU’s and therefore, would depend upon the number of OCPU’s.

Autonomous Transaction Processing (ATP) introduced a new Service – PARALLEL. In the context of ATP, the definition for HIGH, MEDIUM and LOW is different that in ADW. I will cover these in other blog post.

Now, the game becomes more interesting with HIGH and MEDIUM Consumer Groups. While the maximum number of Queries that can run from HIGH is 3 but with concurrency i.e. with concurrent Queries running from MEDIUM and LOW, this number can be less than 3. Same is the case with MEDIUM. With MEDIUM, the maximum numbers would depend on the number of OCPU’s, but actual concurrent queries that can run below the maximum allowed would depend upon the queries running from other consumer groups. For LOW, it is twice the number of OCPU’s and is irrespective of number of queries running from other consumer groups. All these are managed efficiently by pre-configured Resource Managers. Resource Manager, as the name implies, manages the resources. When the load on the system is light, there is very little need for managing the resources. However, as the load increases and system becomes more busy, managing the resources become more important. One of the Primary Goal of Autonomous Database is to ensure consistent performance (always). Therefore, these restrictions. However, as mentioned earlier, other than HIGH, the maximum for other consumer groups depends upon the number of OCPU’s. So, if you have a highly concurrent environment or during peak load with high concurrency, you can scale up more OCPU’s and scale down as well. By the way, Scaling OCPU’s UP and DOWN is Online and is irrespective of the Storage. Let me digress for a moment here. In case of Autonomous Database, if you need additional Storage, you can add it Online and this increase does not mandate adding OCPU’s. Same with the OCPU’s, you don’t need to add Storage. Both are independent. This is not the case with the competition. For example, with AWS Redshift or Microsoft SQL Data Warehouse, since these technologies work on Shared Nothing Architecture, additional Storage means adding more OCPU’s as well. What this means is – Even if you have good amount of CPU resources available and need is to add only the Storage, you will have to add a compute node as well. A storage without a compute is like a DEAD duck. It does nothing on disk unless connected to a Compute Mode.

Coming back to the Resource Manager. As mentioned, resource manager is automatically created and the Application owner has to only make sure that the application queries or the reports are connected to one of the consumer groups based on workload patterns or application requirements.

Why Resource Manager (RM)?

As mentioned earlier, the primary goal of a RM is to Manage System Resources. If resources were unlimited, there would be no need to manage them. Without these, all database connections are considered equal and under heavy load all sessions are impacted equally. There are various components of RM, like Consumer Groups – RM allocate resources to Consumer Groups and not to Sessions, Plan Directives – Resource allocations are actually assigned to Plan Directives and a Consumer group is then assigned a Directive, then finally Resource Plan – a collection of Directives that determines how the resources are to be allocated and managed. Resource Plan is set at the Database or System Level using RESOURCE_MANAGER_PLAN.

In terms of mapping rules, all user sessions (except SYS and SYSTEM) are mapped to a Consumer Group. By default, these are mapped to OTHER_GROUPS, which is a mandatory consumer group. There are 2 types of Session Attributes – Login Attributes and Runtime Attributes. Runtime Attributes is assigned at Runtime and can ne changed anytime during the life cycle of the session. This is basically decided by the application. Login Attributes is tied to the connection and cannot be changed during the lifecycle.

Resource Manager, in ADB uses Login Attribute. Therefore, the application connection is to a particular Service Name has to be pre-decided based on the application requirement.

The maximum number of Queries that can be executed using MEDIUM and LOW Service Names scales linearly. As per Oracle Documentation, for a 16 OCPU Machine, the maximum with MEDIUM is 20 and LOW is 32 (twice the number of OCPU’s). As you scale-up, say to 32 OCPU’s, the maximum for these will Scale as well to 40 for MEDIUM and 64 for LOW.

At present, will restrict this blog post to Resource Manager. Will write on something interesting soon.

Autonomous Database Tech Day ! Gurgaon

 

#Autonomous #AIOUG Presenting a Full Day Event on 8th September 2018 for North India Oracle User Group in Gurgaon. This is on Oracle Autonomous Database. Would be covering some interesting technical capabilities of Autonomous Databases. I am covering the 2 offerings i.e. Autonomous Data Warehouse and Autonomous Transaction Processing. For Registration, click on the following link :

Meraevents Link

This being an Oracle User Group Session, focus would be on the Technical Capabilities of ADW / ATP, like Parallel Processing, Concurrency, Optimizer Enhancements and Behaviour and most importantly, Competition.

So, North India Folks : See you all on 8th September 2018.

%d bloggers like this: