AIOUG ! Oracle Community Yatra 2022

My session on “Database Performance Analytics using Python”

Last 2 years, we all connected over the Virtual World. It was a great experience. However, when it comes to connecting with the speaker and even for the speaker to connect with the participants, virtual world can’t replace physical, in-person sessions. It gives us an opportunity to collaborate with each other and enhance our social world. The good news here is – All India Oracle User Group is back with physical and in-person events, with Oracle Community Yatra 2022. This is scheduled across 6 cities in India, starting 23rd July 2022. I am speaking as well and to know about my session, please continue reading…

For registration, click here

DBA’s – Don’t Worry about your future. With Databases on Cloud, and especially with Autonomous Database gaining much importance, many DBA’s fear that their value would come down as most of their tasks would be automated. This is true, if you are still stuck in performing regular and mundane jobs. Why not use your existing technical skillset and move to a next level? You had been responsible for maintaining health of your Application and Database. Let’s reuse the same skill and knowledge, add some spice to it and elevate from a Performance Analyst to a Performance Analytics field. In this session, I will walk you through some important concepts of Database Performance and then take you to the field of Analytics. During the session, I will also speak about some real life scenario’s and how this Analytics helps us in providing a global view of the entire database performance. It will be a collaboration between my performance skills and python. See you soon.

External Partition ! Oracle Autonomous Database

Autonomous Database was released around March 2018. The first was a Data Warehouse Implementation that supports business Intelligence and Analytical Workload. It was Autonomous Data Warehouse. Next, the transaction workload was added to this family, called as Autonomous Transaction Processing followed by Autonomous JSON and Autonomous APEX.

One of the coolest feature of Autonomous Data Warehouse (ADW) is easy integration with Data Lake, which means, while we can store our data natively within our Autonomous Database – for faster processing, this feature allows us to store and access our data from an external storage. On Cloud, when we talk about external storage, we usually refer to an Object Storage. This is by way of creating an External Table. In this, the Table Metadata is stored in the Data Dictionary and the actual data is stored externally. This allows us to save on Storage cost. Refer to Oracle Documentation for more details on this feature.

External Table can be Non-Partitioned, Partitioned and Hybrid Partitioned. As with Oracle Partitioning that improves Query performance by pruning the partitions that are not needed for our Query. This same improvement applies to External Tables as well. With Partitioned External Table, you get the benefit of Partition Pruning thus scanning only required external data. Remember, Queries on External Data will not be as fast as queries on database tables. However, if I have an Archived Table that is in-frequently accessed and is not performance sensitive, I can always move these to External Storage. And, to improve the query performance on External Storage, External Partitioning is highly recommended. As mentioned earlier, the other benefit of this feature is COST. Hybrid Partitioning is basically mix of both i.e. I can have few historical partitions on an External Storage and latest performance sensitive data within the database.

In this blog, I will talk about External Partition Tables, and subsequent blogs would take this topic further.

Assuming you have a large table with around 5 year of Data. Users frequently query latest data. The table is currently non-partitioned. As the data is growing, I want to ensure consistent performance. How do I do that?

There are multiple strategies, like :

  • Archive the Data to a New Table and Purge it from the Original Table. Reports requiring latest data would run on the Original Table and Reports on the Historical Data would access the Archived Table.
  • Partition the Table and leverage the benefits of Partition Pruning. This is the most common strategy.

My topic for today is on Strategy#1 i.e. Archiving to a New Table. This solution can also be applied if a table has only historical data. Anyways, many customers do implement strategy#1 and move the table to some other database, called as Archival Database. With Autonomous Database, We don’t need to do that and we will see this in action. My subsequent blogs will cover both the strategies that can be optimized further on Autonomous Databases.

For this demonstration, I will use LINEORDER table which is under SSB (Sample Star Benchmark) Schema. This schema is created by-default when you provision an Autonomous Database. It’s a huge table. Therefore, I will create another table as Sample 2% and will use it as my Source table. The very first step would be to identify the partition strategy. In this case, monthly partition on LO_ORDERDATE column is a first choice. However, the other requirement is to have sub-partition it on LO_TAX column, based on Tax Codes. So, what we need here is an External Table with Monthly partitions and Sub-Partitions on LO_TAX.

OT  OWNER		 NAME				  NUM_ROWS     BLOCKS 
--- -------------------- ------------------------------ ---------- ----------
1T  ADMIN		 LINEORDER			 119993317     644744

Next step is to move the data on to the External Storage (an Object Storage). This data will be moved keeping our partitioning strategy in mind. Before I move to the next step, it is important to know that Autonomous Database support many different file formats and structure. On Autonomous Database you can create partitioned external tables from HIVE style partitioned data or from simple folder partitioned data stored on your Cloud Object Store. For example, a Hive format partitioned file will look like :

table/partition1=partition1_value/partition2=partition2_value/data_file.csv

The advantage of this approach is the partitioned information is available in the data file path name itself and therefore, the partitioned columns need not be part of the actual data. Partition pruning is done directly based on the path name. So, in our case the path name would be :

LINEORDER/MONTH_YEAR=JUN1993/TAX=2/filename.gz

In this case, the files will be compressed so that these occupy less space. MONTH_YEAR is a derived column, but TAX is LO_TAX column of LINEORDER Table. Since these columns and column values are part of my file path name, my CSV files need not contain these columns and column values. With Simple Folder partitioned data, the columns and column values of the partitioned columns have to be in the data files. So, in this case since I am using HIVE style partitioned data, I will dynamically create folders with column name and column values.

In order to access my Object Storage, the very first mandatory requirements is to create a credentials. Alternatively, you can use OCI Resource Principals. I am assuming you have all the required credentials. If not, please follow the documentation.

Next, I will use dbms_cloud.export_data api to export the data into compressed CSV files on to my Object Storage. To speed up the process, I created a CONTROL_TABLE with distinct MONYYYY values from LINEORDER and then scheduled a pl/sql block from three different sessions. This pl/sql block will export the data for a specified Month and a Tax Code. The data inserted into the control table has a condition to_char(lo_orderdate,’MONYYYY’)!=’AUG1998′. I will explain this later.

create table control_table (
    partition_name varchar2(100),
    partition_number  number,
    status char(1)
);

insert into control_table(partition_name, status)
SELECT DISTINCT TO_CHAR(LO_ORDERDATE,'MONYYYY') MONTH_YEAR, 'N' FROM LINEORDER WHERE to_char(lo_orderdate,'MONYYYY')!='AUG1998';
COMMIT;
UPDATE control_table set partition_number=rownum;
commit;


DECLARE
  FILE_URI_O	VARCHAR2(1000);
  FILE_URI_F	VARCHAR2(128);
  FILE_URI	VARCHAR2(1000);
  L_STATEMENT	VARCHAR2(1000);
BEGIN
   FOR OUTER_LOOP IN (SELECT PARTITION_NAME MONTH_YEAR, PARTITION_NUMBER FROM CONTROL_TABLE WHERE PARTITION_NUMBER>=1 AND PARTITION_NUMBER<=25 AND STATUS='N' ORDER BY PARTITION_NUMBER)
   LOOP
      FOR INNER_LOOP IN (SELECT DISTINCT LO_TAX TAX FROM LINEORDER WHERE TO_CHAR(LO_ORDERDATE,'MONYYYY')=OUTER_LOOP.MONTH_YEAR)
      LOOP
            L_STATEMENT:='select  lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_suppkey, lo_orderdate, lo_orderpriority, lo_shippriority, lo_quantity, lo_extendedprice, lo_ordtotalprice, lo_discount, lo_revenue, lo_supplycost, lo_commitdate, lo_shipmode from lineorder where to_char(lo_orderdate,''MONYYYY'')='||''''||OUTER_LOOP.MONTH_YEAR||''''||' and lo_tax='||INNER_LOOP.TAX||'';
            FILE_URI_F:='MONTH_YEAR='||OUTER_LOOP.MONTH_YEAR||'/TAX='||INNER_LOOP.TAX||'/';
    	    FILE_URI_O:='https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/'||FILE_URI_F;
    	    FILE_URI:=FILE_URI_O||OUTER_LOOP.MONTH_YEAR||'_'||INNER_LOOP.TAX;
    	    DBMS_CLOUD.EXPORT_DATA(CREDENTIAL_NAME => 'MY_CREDENTIAL', FILE_URI_LIST => FILE_URI, QUERY=>L_STATEMENT, FORMAT=> JSON_OBJECT('TYPE' VALUE 'CSV', 'quote' value '"', 'COMPRESSION' VALUE 'GZIP', 'MAXFILESIZE' VALUE '20485760'));
      END LOOP;
      UPDATE CONTROL_TABLE SET STATUS='Y' WHERE PARTITION_NUMBER=OUTER_LOOP.PARTITION_NUMBER;
      COMMIT;
   END LOOP;
END;
/

L_STATEMENT variable has the query that is used to export the data. I have the required columns in the SELECT Clause alongwith the relevant WHERE predicate. Each run of the statement will create a CSV file for a required partition and sub-partition. Further, I dynamically create the folders. If the folders doesn’t exist, the api will create it for us else it will create the relevant file within it’s folder. The folder name has the partition column name and it’s value. For example, I will have folder names as:

HIVE Format Folders and Files

Now, I have all the data on to my Object Storage and these are HIVE Format. I have compressed CSV files for each Month and Tax. I will now create an External Partition Table. This is done using dbms_cloud.create_external_part_table api.

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE (
        table_name => 'LINEORDER_EXT_PART',
        credential_name => 'MY_CREDENTIAL',
        file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/*.gz',
        column_list =>
            'LO_ORDERKEY NUMBER,
             LO_LINENUMBER number,
             LO_CUSTKEY NUMBER,
             LO_PARTKEY NUMBER,
             LO_SUPPKEY NUMBER,
             LO_ORDERDATE DATE,
             LO_ORDERPRIORITY CHAR(15),
             LO_SHIPPRIORITY CHAR(1),
             LO_QUANTITY NUMBER,
             LO_EXTENDEDPRICE NUMBER,
             LO_ORDTOTALPRICE NUMBER,
             LO_DISCOUNT NUMBER,
             LO_REVENUE NUMBER,
             LO_SUPPLYCOST NUMBER,
             LO_COMMITDATE NUMBER,
             LO_SHIPMODE CHAR(10),
             MONTH_YEAR VARCHAR2(12),
             TAX NUMBER',
        format => '{"type":"csv", "skipheaders":0, "dateformat":"DD-MON-YYYY HH24:MI:SS", "compression":"gzip", "partition_columns":["MONTH_YEAR","TAX"]}');
END;
/

This creates our external partition table. The data resides on our Object Storage and the metadata is stored in the data dictionary. Now, you can run your queries against this table and the data will be retrieved from the Object Storage.

Time to run some queries and check the run time plan. Run time pan will help us validate whether the queries are able to leverage the benefits of partition pruning. We will join this external partitioned table to another table with data stored natively and from the run time plan, we see that Adaptive Optimization kicks in as well.

# Query 1
select lo_custkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1993' and tax=2 and lo_orderkey=3517992324;

LO_CUSTKEY LO_ORDERDATE 	LO_SHIPMOD LO_QUANTITY
---------- -------------------- ---------- -----------
  29663183 10-AUG-1993 00:00:00 MAIL		     9

Elapsed: 00:00:01.75

SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	dmak3yzpw71a2, child number 0
-------------------------------------
select lo_custkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1993' and tax=2 and
lo_orderkey=3517992324

Plan hash value: 1982497926

--------------------------------------------------------------------------------
| Id  | Operation		    | Name		 | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |			 |	  |	3 (100)|
|   1 |  PARTITION LIST SINGLE	    |			 |	1 |	3   (0)|
|*  2 |   EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART |	1 |	3   (0)|
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("MONTH_YEAR"='AUG1993' AND "TAX"=2 AND
	      "LO_ORDERKEY"=3517992324))
	 
     
# Query 2
select lo_custkey, c_name, lo_orderdate, lo_shipmode, lo_quantity 
from 	lineorder_ext_part,
	vivek.customer 
where 	month_year='DEC1997' 
and 	tax=0
and 	lo_orderkey=3187415872
and	c_custkey = lo_custkey;

LO_CUSTKEY C_NAME		     LO_ORDERDATE	  LO_SHIPMOD LO_QUANTITY
---------- ------------------------- -------------------- ---------- -----------
  11640322 Customer#011640322	     04-DEC-1997 00:00:00 FOB		       6

Elapsed: 00:00:02.27
SQL> SQL> @apc
SQL> set echo on
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	6pn53c37c2dfb, child number 0
-------------------------------------
select lo_custkey, c_name, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part,  vivek.customer where  month_year='DEC1997' and
tax=0 and  lo_orderkey=3187415872 and c_custkey = lo_custkey

Plan hash value: 1739419959

--------------------------------------------------------------------------------------
|   Id	| Operation			  | Name	       | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT		  |		       |	|     5 (100)|
|- *  1 |  HASH JOIN			  |		       |      1 |     5   (0)|
|     2 |   NESTED LOOPS		  |		       |      1 |     5   (0)|
|     3 |    NESTED LOOPS		  |		       |      1 |     5   (0)|
|-    4 |     STATISTICS COLLECTOR	  |		       |	|	     |
|     5 |      PARTITION LIST SINGLE	  |		       |      1 |     3   (0)|
|  *  6 |	EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART |      1 |     3   (0)|
|  *  7 |     INDEX UNIQUE SCAN 	  | CUSTOMER_PK        |      1 |     1   (0)|
|     8 |    TABLE ACCESS BY INDEX ROWID  | CUSTOMER	       |      1 |     2   (0)|
|-    9 |   TABLE ACCESS FULL		  | CUSTOMER	       |      1 |     2   (0)|
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("C_CUSTKEY"="LO_CUSTKEY")
   6 - filter(("MONTH_YEAR"='DEC1997' AND "TAX"=0 AND
	      "LO_ORDERKEY"=3187415872))
   7 - access("C_CUSTKEY"="LO_CUSTKEY")

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
   - this is an adaptive plan (rows marked '-' are inactive)
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level

A query against all the sub-partitions or partitions works as well.

SQL> select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1993' and lo_custkey=29663183;

LO_ORDERKEY LO_ORDERDATE	 LO_SHIPMOD LO_QUANTITY
----------- -------------------- ---------- -----------
 3517992324 10-AUG-1993 00:00:00 MAIL		      9
 2251158885 21-AUG-1993 00:00:00 REG AIR	     29

Elapsed: 00:00:12.28
SQL> @apc
SQL> set echo on
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	5m709tc22ua7u, child number 0
-------------------------------------
select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1993' and lo_custkey=29663183

Plan hash value: 146054153

--------------------------------------------------------------------------------
| Id  | Operation		    | Name		 | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |			 |	  |    14 (100)|
|   1 |  PARTITION LIST ITERATOR    |			 |	7 |    14  (15)|
|*  2 |   EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART |	7 |    14  (15)|
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("MONTH_YEAR"='AUG1993' AND "LO_CUSTKEY"=29663183))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level


27 rows selected.

From the queries that we executed against this External Partitioned Table, we can clearly see that for the application it doesn’t matter whether the data is fetched from within the database storage or from an External Storage. It is completely transparent to the application. One important point to note here is that these External Tables are READ ONLY and the Data is not managed by the Database. Now, lets add some more data to this External Partition Table. While generating thh CSV files, I excluded the data for AUGUST 1998. I will not generate the CSV files for AUG1998 and we will see whether I am able to query the data.

DECLARE
  FILE_URI_O	VARCHAR2(1000);
  FILE_URI_F	VARCHAR2(128);
  FILE_URI	VARCHAR2(1000);
  L_STATEMENT	VARCHAR2(1000);
BEGIN
      FOR INNER_LOOP IN (SELECT DISTINCT LO_TAX TAX FROM LINEORDER WHERE TO_CHAR(LO_ORDERDATE,'MONYYYY')='AUG1998')
      LOOP
            L_STATEMENT:='select  lo_orderkey, lo_linenumber, lo_custkey, lo_partkey, lo_suppkey, lo_orderdate, lo_orderpriority, lo_shippriority, lo_quantity, lo_extendedprice, lo_ordtotalprice, lo_discount, lo_revenue, lo_supplycost, lo_commitdate, lo_shipmode from lineorder where to_char(lo_orderdate,''MONYYYY'')='||''''||'AUG1998'||''''||' and lo_tax='||INNER_LOOP.TAX||'';
            FILE_URI_F:='MONTH_YEAR=AUG1998/TAX='||INNER_LOOP.TAX||'/';
    	    FILE_URI_O:='https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/'||FILE_URI_F;
    	    FILE_URI:=FILE_URI_O||'AUG1998_'||INNER_LOOP.TAX;
    	    DBMS_CLOUD.EXPORT_DATA(CREDENTIAL_NAME => 'MY_CREDENTIAL', FILE_URI_LIST => FILE_URI, QUERY=>L_STATEMENT, FORMAT=> JSON_OBJECT('TYPE' VALUE 'CSV', 'quote' value '"', 'COMPRESSION' VALUE 'GZIP', 'MAXFILESIZE' VALUE '20485760'));
      END LOOP;
END;
/

Now, let me run a query against this newly added CSV file. While the files are uploaded and available in the correct HIVE format folders, you can see that Oracle could not access the data and the run time plan says PARTITION LIST EMPTY.

SQL> select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1998' and tax=6 and lo_custkey=16287940;

no rows selected

Elapsed: 00:00:00.11
SQL> @apc
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	5s7wagh8za9bx, child number 1
-------------------------------------
select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1998' and tax=6 and
lo_custkey=16287940

Plan hash value: 3670407871

--------------------------------------------------------------------------------
| Id  | Operation		    | Name		 | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |			 |	  |	2 (100)|
|   1 |  PARTITION LIST EMPTY	    |			 |	1 |	2   (0)|
|*  2 |   EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART |	1 |	2   (0)|
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("MONTH_YEAR"='AUG1998' AND "TAX"=6 AND
	      "LO_CUSTKEY"=16287940))

SQL> select count(*) from lineorder_ext_part where month_year='AUG1998';

  COUNT(*)
----------
	 0

As I said earlier, External Tables are READ Only and the Data is not managed by Oracle Database. For this, we need to Refresh the External Table using SYNC_EXTERNAL_PART_TABLE procedure of dbms_cloud. Once done, the newly added CSV files would be considered as a new partition and the data can then be queried.

BEGIN
   DBMS_CLOUD.SYNC_EXTERNAL_PART_TABLE(table_name => 'LINEORDER_EXT_PART');
END;
/

SQL> select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from lineorder_ext_part where month_year='AUG1998' and tax=6 and lo_custkey=16287940;

LO_ORDERKEY LO_ORDERDATE	 LO_SHIPMOD LO_QUANTITY
----------- -------------------- ---------- -----------
 3004306725 02-AUG-1998 00:00:00 REG AIR	     14

Elapsed: 00:00:42.34
SQL> @apc
SQL> set echo on
SQL> set lines 200
SQL> set pages 1000
SQL> select * from table(dbms_xplan.display_cursor(format=>'ALLSTATS LAST -bytes +predicate +note +cost +adaptive'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	5s7wagh8za9bx, child number 1
-------------------------------------
select lo_orderkey, lo_orderdate, lo_shipmode, lo_quantity from
lineorder_ext_part where month_year='AUG1998' and tax=6 and
lo_custkey=16287940

Plan hash value: 917046205

----------------------------------------------------------------------------------
| Id  | Operation		      | Name		   | E-Rows | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	      | 		   |	    |	  2 (100)|
|   1 |  PX COORDINATOR 	      | 		   |	    |		 |
|   2 |   PX SEND QC (RANDOM)	      | :TQ10000	   |	  1 |	  2   (0)|
|   3 |    PX BLOCK ITERATOR	      | 		   |	  1 |	  2   (0)|
|*  4 |     EXTERNAL TABLE ACCESS FULL| LINEORDER_EXT_PART |	  1 |	  2   (0)|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(("MONTH_YEAR"='AUG1998' AND "TAX"=6 AND
	      "LO_CUSTKEY"=16287940))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 16 because of degree limit
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level


31 rows selected.

Elapsed: 00:00:00.08
SQL> select count(*) from lineorder_ext_part where month_year='AUG1998';

  COUNT(*)
----------
     99688

Elapsed: 00:00:01.44
SQL> 

Wow! a cool feature and easy to use. Isn’t it? Last, with the latest enhancement to this feature, two new hidden columns are added to the External Table. These are file$path and file$name. These columns can be queried to check the Source File Path and Name.

SQL> select lo_orderkey, file$path, file$name from lineorder_ext_part where month_year='AUG1993' and tax=2 and lo_custkey=29663183;

LO_ORDERKEY FILE$PATH															     FILE$NAME
----------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------
 3517992324 https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_tenancy/b/External_Partitions/o/LINEORDER/MONTH_YEAR=AUG1993/TAX=2	     AUG1993_2_10_20220704T111509Z.csv.gz

This is very easy to implement on Autonomous Database and is a Cost Saving option. Performance insensitive and historical data can be easily moved to an external storage. Accessing this data is as simple as accessing natively stored data. Do try it out and let me know how it works.

Database Vault ! Autonomous Database

Availability of Database Vault on Autonomous Database (Shared Infrastructure) was announced in February 2020. Database Vault does not require any introduction as this feature has been available since Oracle 10g. As a brief, it provides powerful security controls to help protect sensitive application data from Unauthorised access and comply with privacy and regulatory requirements. This means, a superuser (or a user with DBA privilege) can access the sensitive application data. Oracle Database Vault implements this security at the database layer.

Database Vault provides additional security to the Autonomous Database Family and is just few steps. This blog demonstrates the steps needed to implement Database Vault on Autonomous Data Warehouse. Before I go through the steps, it is important to know a concept of REALMS. Realm is a grouping of database schemas, database objects and roles that must be secured for a given application. It is a kind of a Zone of Protection for your database objects. So, one of the steps required will be to create a Realm.

Every Autonomous Database has an ADMIN user, which is a super user. ADMIN user can query any table within the Database (by default). All the steps required for implementing Database Vault is to be executed using ADMIN user. Once done, ADMIN user will continue to be a super user, but will not be able to query the application data. In my case, assume VIVEK is an application schema containing all the tables accessed by my application and one such table is CREDIT_CARD.


- Connect to ADMIN User and Query the CREDIT_CARD Table. ADMIN is able to query the data.

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Check whether Data Vault is Configured and Enabled. It is not configured.

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS FALSE
DV_ENABLE_STATUS    FALSE

- Create Database Vault Owner and Database Vault Account Manager User.

CREATE USER ADB_DBV_OWNER IDENTIFIED BY "xxxxxxxxxxxxx";
CREATE USER ADB_DBV_ACCTMGR IDENTIFIED BY "xxxxxxxxxxxxx";
GRANT CREATE SESSION, SELECT ANY TABLE TO ADB_DBV_OWNER;

- Check whether we can select the data from CREDIT_CARD table, before implementing Database Vault
- Database Audit Vault User has SELECT ANY TABLE privilege.

CONNECT ADB_DBV_OWNER/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00


- Configure and Enable Database Vault

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low


EXEC DBMS_CLOUD_MACADM.CONFIGURE_DATABASE_VAULT('ADB_DBV_OWNER', 'ADB_DBV_ACCTMGR');

- Database Vault Configured, but not Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    FALSE

- Enable Database Vault. For this to come into effect, we need to restart the database

EXEC DBMS_CLOUD_MACADM.ENABLE_DATABASE_VAULT;

- Restart the Database. Check after Database Restart
- Database Vault Configured and Enabled

SELECT * FROM DBA_DV_STATUS;

NAME		    STATUS
------------------- ----------------------------------------------------------------
DV_CONFIGURE_STATUS TRUE
DV_ENABLE_STATUS    TRUE

- Admin can still Query the Data

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

- Create REALM, Add Object to the Realm 

BEGIN
 DBMS_MACADM.CREATE_REALM(
  realm_name    => 'Vivek Application Schema', 
  description   => 'Realm created for Vivek Schema', 
  enabled       => DBMS_MACUTL.G_YES, 
  audit_options => DBMS_MACUTL.G_REALM_AUDIT_FAIL + DBMS_MACUTL.G_REALM_AUDIT_SUCCESS,
  realm_type    => 1,
  realm_scope   => DBMS_MACUTL.G_SCOPE_LOCAL);
END; 
/

BEGIN
 DBMS_MACADM.ADD_OBJECT_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  object_owner => 'VIVEK', 
  object_name  => '%', 
  object_type  => '%'); 
END;
/

- Add Authorised User, which is the User that can Query the Application Table
- In this case, VIVEK is the application owner

BEGIN
 DBMS_MACADM.ADD_AUTH_TO_REALM(
  realm_name   => 'Vivek Application Schema', 
  grantee      => 'VIVEK', 
  auth_options => DBMS_MACUTL.G_REALM_AUTH_OWNER);
END;
/

- Check whether ADMIN can Query the sensitive Data from Vivek Schema

CONNECT ADMIN/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;
                    *
ERROR at line 1:
ORA-01031: insufficient privileges

- Vivek being an Application Owner, can Query the data. No other Users allowed to Query

CONNECT VIVEK/xxxxxxxxxxxxx@ADWxxxxxx_low

SELECT * FROM VIVEK.CREDIT_CARD;

CC_NO				 CC_NAME			  EXPIRY_DT
-------------------------------- -------------------------------- --------------------
1234 5678 8765 4321		 ADB_1				  29-FEB-2024 00:00:00
5678 8765 1234 4321		 ADB_2				  31-DEC-2021 00:00:00

As can be seen, the steps are pretty simple. This is a simplified version of Database Vault configuration. Refer to Oracle Documentation on the various configuration options available with Database Vault. Hope this helps.

DBMS_CLOUD : Loading a TEXT file without any DELIMITER

Recently I was working on an issue related to loading data from a flat file on to Autonomous Data Warehouse using dbms_cloud api. While lot has been written on this subject, I thought of writing on this as the challenge here was that the file in this case was without any delimiter. The Original data to be loaded was around 90 Million, but for testing purpose customer was loading around 200k rows.

What is a Delimiter? Delimiter is one or more character that separates two strings. These characters are used to identify different columns.

One of the best blog on this topic is this blog from Nilay Panchal.

Coming back to my topic of this blog. In this case, customer was trying to load 200k rows into a table and the way they were using it, the time it took to load these 200k rows was around 25 minutes which is too huge considering that actual production number of rows would be 90 Million.

As mentioned, the challenge here was that the content in this flat file had no delimiter or any character to identify column break. The original flat file had around 500 columns, but for testing the file was truncated to 5 columns. The idea here is to demonstrate an efficient way of loading the data into a table when the data in the flat file is without any delimiter. The sample data is as under:

CCODEDEBUTRUNSHUNDREDSPLAYER_NAME
IND151119891592151SACHIN TENDULKAR
AUS081219951337841   RICKY PONTING
 SA141219951328945  JACQUES KALLIS
 SL200720001240038 KUMAR SANGAKARA
 WI061219901195334      BRIAN LARA
IND20062011 720227     VIRAT KOHLI

Our very first step will be to create cloud object storage credentials using dbms_cloud.create_credentials. If you have already created a credential, then you can skip this. Check dba_credentials for the list of Credentials.

begin
  dbms_cloud.create_credential(
    credential_name=>'VIVEK_CREDENTIAL',
    username=>'VIVEKS',
    password=>'paste_auth_token_here'
  );
end;
/

SQL> column owner for a20
SQL> column credential_name for a30
SQL> column username for a20
SQL> select owner, credential_name, username from dba_credentials;

OWNER		     CREDENTIAL_NAME		    USERNAME
-------------------- ------------------------------ --------------------
ADMIN		     VIVEK_CREDENTIAL		    VIVEKS

I have uploaded by data.txt file that contains the required data to be loaded on the Object Storage. Next Step will be create a table in which the data is to be loaded. For COPY_DATA, you need to ensure the Target Table is already created as the procedure would fail.

CREATE TABLE CRICKETER_DATA
   (CCODE VARCHAR2(3) ,
   DEBUT DATE,
   RUNS  NUMBER(5),
   HUNDREDS NUMBER(2),
   PLAYER_NAME VARCHAR2(16));

Now, I will create by COPY_DATA procedure with the required inputs. It is important to note the format and field_list option. In my case, the date was in DDMMYYYY format and hence I had to specify the dateformat explicitly to allow copy_data to parse the date accurately. Without the dateformat value, my copy_data failed with an error. The other parameter is field_list, which ensures that in the absence of delimiter, the copy_data parses the column values based on the start and end position. You need to ensure that these position are accurate.

begin
dbms_cloud.copy_data(
    table_name =>'CRICKETER_DATA',
    credential_name =>'VIVEK_CREDENTIAL',
    file_uri_list =>'&MY_OBJECT_STORAGE_URL/data.txt',
    format => json_object('skipheaders' value '1', 'dateformat' value 'ddmmyyyy'),
    field_list =>'CCODE POSITION(1:3), DEBUT POSITION(4:11), RUNS POSITION(12:16), HUNDREDS POSITION(17:18), PLAYER_NAME POSITION(19:34)');
end;
/

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.50
SQL> SQL> select * from cricketer_data;

CCO DEBUT		       RUNS   HUNDREDS PLAYER_NAME
--- -------------------- ---------- ---------- ----------------
IND 15-NOV-1989 00:00:00      15921	    51 SACHIN TENDULKAR
AUS 08-DEC-1995 00:00:00      13378	    41	  RICKY PONTING
 SA 14-DEC-1995 00:00:00      13289	    45	 JACQUES KALLIS
 SL 20-JUL-2000 00:00:00      12400	    38	KUMAR SANGAKARA
 WI 06-DEC-1990 00:00:00      11953	    34	     BRIAN LARA
IND 20-JUN-2011 00:00:00       7202	    27	    VIRAT KOHLI

6 rows selected.

If for some reason the copy_data procedure fails, it creates a error log table, which is reported immediately along with the error number. To drill down to the actual error, you need to query the log table. For example, when I omitted FORMAT option from my COPY_DATA procedure, it failed with following error. I queried the COPY$22_LOG to extract the actual error.


select * from COPY$21_LOG;

RECORD
------------------------------------------------------------------------------------------------------------------------------------
 LOG file opened at 01/28/20 05:15:18

Total Number of Files=1

Data File: &MY_OBJECT_STORAGE_URL/data.txt

Log File: COPY$21_331103.log

 LOG file opened at 01/28/20 05:15:18

Bad File: COPY$21_331103.bad

Field Definitions for table COPY$Q18IZ07MUUMYRU3IG6MU
  Record format DELIMITED BY
  Data in file has same endianness as the platform
  Rows with all null fields are accepted

  Fields in Data Source:

    CCODE			    CHAR (3)
      Record position (1, 3)
      Terminated by "|"
    DEBUT			    CHAR (8)
      Record position (4, 11)
      Terminated by "|"
    RUNS			    CHAR (5)
      Record position (12, 16)
      Terminated by "|"
    HUNDREDS			    CHAR (2)
      Record position (17, 18)
      Terminated by "|"
    PLAYER_NAME 		    CHAR (16)
      Record position (19, 34)
      Terminated by "|"
error processing column DEBUT in row 1 for datafile &MY_OBJECT_STORAGE_URL/data.txt

ORA-01858: a non-numeric character was found where a numeric was expected

40 rows selected.

Oracle Database ! Multi-Model Capabilities.

Everything revolves around DATA. One of the challenges for an Organization is to quickly adapt to the ever changing Business User Needs. These needs are driven by Competition and Business Users Responsible for the Company’s Growth at the same time remain Competitive.

Data Arrives in different format – Structured and Unstructured. It is important to understand the 3V’s of Big Data Technologies. Data arrives in different format (Variety) as these are generated from different sources. Data generation is not restricted to a Single Device, like Laptops or Desktops. These are generated from different devices 24×7. This means Volume has gone up tremendously. And since, it is generated 24×7, the Speed (Velocity) with which the data arrives, gives less time for the Business User to analyse and derive some meaningful insights from this data. Less time, because Data that arrives in different format is either Transformed into Structured format and is loaded into a relational database that enforces data modelled by a Schema (Schema on Write) or is kept in an Unstructured Format and queried using some other database models (NoSQL, NewSQL or Big Data). This means, IT has to provide different means for connecting to these disparate systems which introduces Complexity. Integration poses another challenge. Combining these data for faster and nearly accurate analysis is another Big Challenge. Therefore, it has now become more critical for a CIO or an IT Department to come up with an Innovative Solutions to complex processes and problems revolving around disparate systems storing different formats of DATA.

POLYGLOT Persistance

It is a concept of using different data storage technologies to handle different data storage needs within a given Software application. Every Application Vendor or an ISV or System Implementers provide best database to address their application demand/requirements but no one company has a Single Application. Each Application caters to different use cases with different data models thus different data storage. Integrating these to provide a Unified View is a biggest challenge.

Multi-Model Databases provide an Organization the required Agility. The flexibility to use different data models, to address different use cases, to store Variety of Data for the Business Users within a Single Integrated Backend Database. This addresses the complexity challenge introduced by Polyglot Persistance.

SIGNIFICANT Recognition

One of the Significant Recognition for Oracle Database is that it scored highest in both the current offerings and strategy categories by Forester. This is the ability to support converged OLTP and Analytical (Translytical) Workloads using Database In-Memory, Exadata and extensive Multi-Model Capabilities.

It’s a privilige to be a part of such an Innovative Technology and, Yes – to be a part of an Innovative Organization.

Link to Forester Report

Oracle Groundbreaker Yatra ! July 2019..

After Sangem 2018, now it’s time for another OTN Yatra, now renamed to Oracle Groundbreaker Yatra. Sangam is hosted every year and had been either in Bangalore or Hyderabad. It is a One City Event. Oracle Groundbreaker Yatra on the other hand is a Multi-City Tour. I assume, last year it was a 6 City Tour. This year, it is scheduled to be hosted in Mumbai, Chennai, Delhi, Bengaluru, Hyderabad, Kolkata, Pune, Ahmedabad, Visakhapatnam and Thiruvananthapuram.

Registration for the Groundbreaker would open soon. Many prominent speakers would be travelling and it should be a never-miss event for the Oracle Database Community (DBA’s, Developers, Architects, Data Scientists etc). Sandesh Rao, Connor Mcdonald, Roy Swonger and Gurmeet Goindi would be speaking on some interesting topic (as always). Roy Swonger from Oracle US would be visiting India for the first time. Every time you attend the sessions from these experts, you tend to learn something new and I am sure, this time as well, you will come out with bags full of Knowledge.

I am also one of the speakers for the Groundbreaker Yatra and would be travelling to 4 cities, apart from Mumbai (as it is my base location). I have opted for the locations, where I am either travelling for the first time for an User Group event or have visited only once or twice. Now, you can guess it :). I will be in Ahmedabad, Kolkata, Visakhapatnam, Thiruvananthapuram and Mumbai. I assume the agenda for some of the locations is already published and for the other locations, is in the final stage.

Keep a tab on this link Groundbreaker Yatra 2019, so that you do not miss on the registration.

Now, on my session. I am presenting a Session on “Database and Application Performance ! Then and Now.” It is a 1 hour session. I would be covering some of the Performance Challenges the DBA’s or Developers use to face and still face and how these are automatically taken care in Autonomous Databases. I will cover Optimizer, Parallelism, Oracle 19c and walk through some of the topics around Reinforcement Learning and Data Science. Usually, my sessions are supported by live demo’s. But for a 1 hour session, I am not sure whether I will be able to, but shall try to make it more interesting. See you all in July 2019.

Python – Error Installing cx_Oracle on Mac

Just wanted to share a solution that worked for me while resolving an issue installing cx_Oracle on my MacBook. I wanted to connect Python to an Oracle Database. I installed Oracle Instant Client 18c from here. I clicked on “Instant Client for Mac OS X (Intel x86)(32 bit and 64 bit). Then I downloaded following :

- Basic Package - All files required to run OCI, OCCI and JDBC-OCI applications
- The other 2 Downloads under Tools - optional packages.

On Mac, by default these downloads are unzipped. I copied all the files into a Single Folder and updated the .bash_profile with ORACLE_HOME. I made sure that I am able to execute sqlplus. Once done, I started the installation of cx_Oracle. Received the following error :

python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... error
  Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-wheel-5rm6ofqh --python-tag cp36:
  running bdist_wheel
  running build
  running build_ext
  building 'cx_Oracle' extension
  creating build
  creating build/temp.macosx-10.7-x86_64-3.6
  creating build/temp.macosx-10.7-x86_64-3.6/src
  creating build/temp.macosx-10.7-x86_64-3.6/odpi
  creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
  xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for cx-oracle
  Running setup.py clean for cx-oracle
Failed to build cx-oracle
Installing collected packages: cx-oracle
  Running setup.py install for cx-oracle ... error
    Complete output from command /Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'cx_Oracle' extension
    creating build
    creating build/temp.macosx-10.7-x86_64-3.6
    creating build/temp.macosx-10.7-x86_64-3.6/src
    creating build/temp.macosx-10.7-x86_64-3.6/odpi
    creating build/temp.macosx-10.7-x86_64-3.6/odpi/src
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/vivek/anaconda3/include -arch x86_64 -I/Users/vivek/anaconda3/include -arch x86_64 -DCXO_BUILD_VERSION=7.1.2 -Iodpi/include -Iodpi/src -I/Users/vivek/anaconda3/include/python3.6m -c src/cxoBuffer.c -o build/temp.macosx-10.7-x86_64-3.6/src/cxoBuffer.o
    xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/Users/vivek/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-record-08000fay/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/63/_hp23hwx0kg3f469c_jqwzq00000gn/T/pip-install-kt4hhttb/cx-oracle/

The relevant error here was “xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools)”, which is related to Xcode. To be honest, searched few solutions on google and finally got a solution which was to run download and install Xcode.

xcode-select --install

Just Download, Install Xcode and the installation went smooth. I can now connect python to an Oracle Database. Task accomplished 🙂

Viveks-MacBook-Air:~ vivek$ xcode-select --install
xcode-select: note: install requested for command line developer tools
Viveks-MacBook-Air:~ vivek$ python -m pip install -U cx_oracle
Collecting cx_oracle
  Using cached https://files.pythonhosted.org/packages/4b/aa/99e49d10e56ff0263a8927f4ddb7e8cdd4671019041773f61b3259416043/cx_Oracle-7.1.2.tar.gz
Building wheels for collected packages: cx-oracle
  Building wheel for cx-oracle (setup.py) ... done
  Stored in directory: /Users/vivek/Library/Caches/pip/wheels/5f/b1/97/adf672ad4ab215e227ba939604abccc086e16c91b04cccdcd7
Successfully built cx-oracle
Installing collected packages: cx-oracle
Successfully installed cx-oracle-7.1.2

One important point to remember is that while it says cx_oracle installed, while importing the module in Python Prompt or on Jupyter, it is cx_Oracle.

All India Oracle User Group Events for 2018

Sangam 2018 is scheduled for 7th and 8th December 2018. I am presenting a session on “SQL Tuning ! Tips, Tools and Techniques” which is on 7th December 2018 and is immediately after Lunch. A 45 minute session doesn’t justify the topic, but will try my level best to cover few interesting real life examples. On 8th December, I will be co-hosting a Fire side chat on Performance along side Tirthankar Lahiri (VP of Product Management, Oracle US), Arup Nanda, GP Gongloor and Karan Dodwal. This is scheduled for 2.55 pm. The only issue is that I may have to leave early to catch my flight back to Mumbai.

North India Chapter – Chandigarh. Mark your calendar for 24th November 2018 as I will be speaking for the entire day on Performance Optimization covering some interesting topics around SQL Optimization, Optimizer, Indexes, Autonomous Databases and many more. This is a Full day event which gives me good amount of time to demonstrate some of the interesting facts. Registration for this event is open. Use the following link to register. See you soon.

https://www.meraevents.com/us/previewevent?view=preview&eventId=183767

For 2019, will publish the calendar, once it is freezed 🙂

Importance of Constraints ! Know your Reporting Tools

Recently, I was working on a customer issue. The issue was a performance comparison between a Competition and an Oracle Database. The performance of Competition was reported to be better than Oracle. Now, this is a classic case of Bad Schema Design and a Badly Written Query. Working on this issue reminded me of a great learning that I had after reading “Expert Oracle Database Architecture” by Thomas Kyte. He wrote about a classic issue with a Pl/SQL Block running in SQL Server and generating wrong results when ported to Oracle Database due to the way these 2 databases compare NULL values. Each of these databases are different. It also reminded me of one of my response to a query from a customer on “A count(*) from a Table is doing a Full Table Scan, even though it has a Unique Index on it”.

As Tom mentioned in his book, every database is different and implement the features differently. Now, the third party tools that connect to each of these different data sources generate queries that syntactically work on all but may not run optimally. Therefore, it is important to understand our Data and design the Schema with all the required constraints and indexes in place.

In this case, customer was running few analytical reports from a Third Party Reporting Tool (Tableau). Customer selected few columns with couple of predicates with Date Range and some product name. The Queries were around 40-50 Lines. Surprisingly, for each of the Predicates, Tableau added few additional predicates and due to these additional predicates, the run time plan was around 2700+ lines long. As an example, I have a table T1 and have to generate a report selecting object id’s and names for any of the 2 TEMPORARY values (Y or N). My query will look like :

select object_id, object_name from t1 where temporary=:b1;

If I have an Index on TEMPORARY Column, Optimizer will come out with an optimal plan based on it’s cost calculation. However, when run from Tableau, it came out with the following query:

select object_id, object_name from t1
where ((temporary=:b1) or (temporary is NULL and :b1 is NULL));

What will be the implication of this added OR conditions ? Let’s see.

SQL> create table t1 as
select * from all_objects;  2

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'T1', method_opt=>'for all columns size auto for columns temporary size 100');

PL/SQL procedure successfully completed.

SQL> create index t1_idx on t1(temporary);

Index created.

SQL> select num_rows, blocks from dba_tables where table_name='T1';

  NUM_ROWS     BLOCKS
---------- ----------
     68417       1375

SQL> variable l_temporary varchar2(1);
SQL> exec :l_temporary:='Y';

PL/SQL procedure successfully completed.

SQL> select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL)); 

161 rows selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  7xsahyu4q594y, child number 0
-------------------------------------
select object_id, object_name from t1 where ((temporary=:l_temporary)
or (temporary is NULL and :l_temporary is NULL))

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |   375 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |   161 |  6923 |   375   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("TEMPORARY"=:L_TEMPORARY OR ("TEMPORARY" IS NULL AND
              :L_TEMPORARY IS NULL)))

161 Rows out of 68000, which is less than 1%. Index Access would have been a better option. At customer end, the table was huge with Billions of Rows and with multiple such OR Predicates, the very first observation was the amount of time the Optimizer took to Parse the Query. The Query took around 184 Seconds to run and the observation was that out of 184 Seconds, around 175 Seconds were spent on Parsing. This was identified as a BUG in 18c (BUG#28725660) and the primary cause identified as the change in OR Expansion behaviour in 18c. This BUG is fixed in 19c. Backporting it to 18c would have taken some time, so the other fix that we applied was to add NOT NULL Constraints to some of the columns. From the Data, we could see that none of the columns had NULL values. We checked with the developers and they mentioned that NULL values are not stored in this column. Therefore, it was safe to add these constraints. Continuing with our example above, let’s add a NOT NULL constraint to our Table T1.

SQL> alter table t1 modify temporary not null;

Table altered.

select object_id, object_name from t1
where ((temporary=:l_temporary) or (temporary is NULL and :l_temporary is NULL));

select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 1775246573

----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |       |     5 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |   161 |  6923 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |   161 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEMPORARY"=:L_TEMPORARY)

The run time plan of the same query is now an Index Scan and is much better than previous. This additional input to the Optimizer was enough to transform the query. You can generate 10053 trace to see the transformation.

NOT NULL constraint combined with Unique Index is required to answer a COUNT(*) query from an Index, which one of my customer asked long time back. This is another case, where NOT NULL constraint helped optimizer come up with an optimal plan.

It is important to optimize the Schema and provide all the critical inputs to the Optimizer. Generic reporting tools would come out with queries that work on all the databases. These generated queries may work on some database and may not work on other. Therefore, the title of this blog “Know your Reporting Tool” :).

On the real customer issue, post adding the constraints to few of the columns, the query response time reduced drastically from 184 Seconds to less than 10 seconds far better than the competition.

Autonomous Database ! ADW and ATP. Why do I need both ?

Autonomous Data Warehouse (ADW) was announced in March 2018 and then Autonomous Transaction Processing (ATP) is the latest addition to the Autonomous Database family. ATP is for Mixed Workload and therefore provides 4 Service Names. These are :

HIGH :Highest level of CPU and IO Resources. Concurrency with this Service Name depends upon the number of OCPUs and will scale accordingly.

MEDIUM :Lowest level of CPU and IO Resources. Concurrency depends on the number of CPU’s and will scale accordingly.

LOW :Least amount of CPU and IO Resources. Again, Concurrency depends on the number of CPU’s and the Scaling is directly proportional to the number of OCPU’s.

ADW provides these three Service Names as well. However, unlike ADW (see my previous blog), the definition of these Services are different. In ADW, HIGH provides highest level of Resources but less concurrency, whereas, in ATP, HIGH do provide highest level of Resources but also provides Highest level of Concurrency. Since ATP is designed for Mixed Workload, it provides an additional Service Name PARALLEL. It provides high level of resources but lowest concurrency and is useful for Batch Processes or Heavy Reporting.

Now, many of the technical folks have raised this question on “Why do we need Both? Can’t we use ATP to take care of Data Warehouse kind of a load?”. ATP very well does that by providing an option to run Mixed Workload. However, there are many implementations that are purely Data Warehouse. The problem is that the amount of data that we generate, Warehouse Databases have grown massive in size supporting large amount of complex queries involving many nested joins and expecting sub-second response time. Further, these Databases house data from many different sources. Optimization Strategy for an OLTP and Data Warehouses are completely different. Therefore, an OLTP kind of a setup cannot (in most of the cases) provide the response time needed for DWH Queries.

Row Store v/s Columnar Store

Now, coming back to ATP v/s ADW argument. In an ADW, the data loaded is by default compressed to Oracle Hybrid Columnar Compression, whereas ATP uses a Row Store Format. What this means is that in an ADW, the data stored at disk is in a Columnar format, which is a proven format for speeding up Analytics kind of a Workload where we query few columns. All Data Warehouse Implementations, like Redshift, Google Big Query etc. store data in Columnar format. On the other hand, ATP stores the data in Row format which is ideal for OLTP that demands Single or few row(s) updates or deletes or queries small number of rows.

Since the data in ADW is stored Compressed in Columnar format, it allows scanning of Billions of Rows per CPU cycle. The data for a particular column is stored contiguously making it possible to scan through large number of rows in a single CPU cycles. This technique is called SIMD (Single Instruction Multiple Dataset) processing. To meet a sub-second response time, this technique is plays a significant role.

Such large scans (scanning billion of rows per second) is not possible with Row Store and one of the reasons for this can be found in one of my blog published in 2015 which is still very much relevant. Skipping of Columns consume CPU Cycles which is demonstrated in the blog post that can be found here.

Through this short blog, I wanted to clear the confusion around ADW and ATP. Both have their own use cases and thus are optimized to manage different workloads. Both of these Autonomous Databases have some exciting features. Stay tuned for more on these.

%d bloggers like this: