DBMS_CLOUD : Loading a TEXT file without any DELIMITER

Recently I was working on an issue related to loading data from a flat file on to Autonomous Data Warehouse using dbms_cloud api. While lot has been written on this subject, I thought of writing on this as the challenge here was that the file in this case was without any delimiter. The Original data to be loaded was around 90 Million, but for testing purpose customer was loading around 200k rows.

What is a Delimiter? Delimiter is one or more character that separates two strings. These characters are used to identify different columns.

One of the best blog on this topic is this blog from Nilay Panchal.

Coming back to my topic of this blog. In this case, customer was trying to load 200k rows into a table and the way they were using it, the time it took to load these 200k rows was around 25 minutes which is too huge considering that actual production number of rows would be 90 Million.

As mentioned, the challenge here was that the content in this flat file had no delimiter or any character to identify column break. The original flat file had around 500 columns, but for testing the file was truncated to 5 columns. The idea here is to demonstrate an efficient way of loading the data into a table when the data in the flat file is without any delimiter. The sample data is as under:

CCODEDEBUTRUNSHUNDREDSPLAYER_NAME
IND151119891592151SACHIN TENDULKAR
AUS081219951337841   RICKY PONTING
 SA141219951328945  JACQUES KALLIS
 SL200720001240038 KUMAR SANGAKARA
 WI061219901195334      BRIAN LARA
IND20062011 720227     VIRAT KOHLI

Our very first step will be to create cloud object storage credentials using dbms_cloud.create_credentials. If you have already created a credential, then you can skip this. Check dba_credentials for the list of Credentials.

begin
  dbms_cloud.create_credential(
    credential_name=>'VIVEK_CREDENTIAL',
    username=>'VIVEKS',
    password=>'paste_auth_token_here'
  );
end;
/

SQL> column owner for a20
SQL> column credential_name for a30
SQL> column username for a20
SQL> select owner, credential_name, username from dba_credentials;

OWNER		     CREDENTIAL_NAME		    USERNAME
-------------------- ------------------------------ --------------------
ADMIN		     VIVEK_CREDENTIAL		    VIVEKS

I have uploaded by data.txt file that contains the required data to be loaded on the Object Storage. Next Step will be create a table in which the data is to be loaded. For COPY_DATA, you need to ensure the Target Table is already created as the procedure would fail.

CREATE TABLE CRICKETER_DATA
   (CCODE VARCHAR2(3) ,
   DEBUT DATE,
   RUNS  NUMBER(5),
   HUNDREDS NUMBER(2),
   PLAYER_NAME VARCHAR2(16));

Now, I will create by COPY_DATA procedure with the required inputs. It is important to note the format and field_list option. In my case, the date was in DDMMYYYY format and hence I had to specify the dateformat explicitly to allow copy_data to parse the date accurately. Without the dateformat value, my copy_data failed with an error. The other parameter is field_list, which ensures that in the absence of delimiter, the copy_data parses the column values based on the start and end position. You need to ensure that these position are accurate.

begin
dbms_cloud.copy_data(
    table_name =>'CRICKETER_DATA',
    credential_name =>'VIVEK_CREDENTIAL',
    file_uri_list =>'&MY_OBJECT_STORAGE_URL/data.txt',
    format => json_object('skipheaders' value '1', 'dateformat' value 'ddmmyyyy'),
    field_list =>'CCODE POSITION(1:3), DEBUT POSITION(4:11), RUNS POSITION(12:16), HUNDREDS POSITION(17:18), PLAYER_NAME POSITION(19:34)');
end;
/

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.50
SQL> SQL> select * from cricketer_data;

CCO DEBUT		       RUNS   HUNDREDS PLAYER_NAME
--- -------------------- ---------- ---------- ----------------
IND 15-NOV-1989 00:00:00      15921	    51 SACHIN TENDULKAR
AUS 08-DEC-1995 00:00:00      13378	    41	  RICKY PONTING
 SA 14-DEC-1995 00:00:00      13289	    45	 JACQUES KALLIS
 SL 20-JUL-2000 00:00:00      12400	    38	KUMAR SANGAKARA
 WI 06-DEC-1990 00:00:00      11953	    34	     BRIAN LARA
IND 20-JUN-2011 00:00:00       7202	    27	    VIRAT KOHLI

6 rows selected.

If for some reason the copy_data procedure fails, it creates a error log table, which is reported immediately along with the error number. To drill down to the actual error, you need to query the log table. For example, when I omitted FORMAT option from my COPY_DATA procedure, it failed with following error. I queried the COPY$22_LOG to extract the actual error.


select * from COPY$21_LOG;

RECORD
------------------------------------------------------------------------------------------------------------------------------------
 LOG file opened at 01/28/20 05:15:18

Total Number of Files=1

Data File: &MY_OBJECT_STORAGE_URL/data.txt

Log File: COPY$21_331103.log

 LOG file opened at 01/28/20 05:15:18

Bad File: COPY$21_331103.bad

Field Definitions for table COPY$Q18IZ07MUUMYRU3IG6MU
  Record format DELIMITED BY
  Data in file has same endianness as the platform
  Rows with all null fields are accepted

  Fields in Data Source:

    CCODE			    CHAR (3)
      Record position (1, 3)
      Terminated by "|"
    DEBUT			    CHAR (8)
      Record position (4, 11)
      Terminated by "|"
    RUNS			    CHAR (5)
      Record position (12, 16)
      Terminated by "|"
    HUNDREDS			    CHAR (2)
      Record position (17, 18)
      Terminated by "|"
    PLAYER_NAME 		    CHAR (16)
      Record position (19, 34)
      Terminated by "|"
error processing column DEBUT in row 1 for datafile &MY_OBJECT_STORAGE_URL/data.txt

ORA-01858: a non-numeric character was found where a numeric was expected

40 rows selected.

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s