PL/SQL Parallelism made easier in 11g

Today was the first day of the two day AIOUG Event Sangam 2010. Jonathan Lewis was the Key Speaker this year and I presented a session on “Developing a Scalable Application”. One of my slide mentioned about PL/SQL Parallelism. This is in continuation to my previous blog where I posted about Manual Parallelism that takes the ROWID’s of the table and distributes it across multiple chunks. The limitation with Manual Parallelism is that even though a pl/sql block needs to process 5% to 10% of the rows from a huge table, the Manual Parallelism will select and split the ROWIDs of the entire table. This means, many of the chunks may process very less data or almost zero data.

Oracle Database 11g Release 2 introduced a package dbms_parallel_execute, that makes it easier to split a table into multiple chunks, based on ROWID’s or SQL or by Number Column. During my AIOUG session, I demonstrated the use of this package to split a table into multiple chunks and schedule a batch process in parallel. The best part of this package is that the chunks can be split based on a condition and only the chunks that contain to-be-processed data will be created. Thus the rows to be processed by each of the chunks will have uniform distribution.

As mentioned in my earlier blog, I will reiterate the benefit of pl/sql parallelism, which is, optimal use of available resources. In this blog, I will post the test case that I executed during the AIOUG session and this should help you reproduce this in your environment.

In this example, consider a Nightly Batch Job NIGHTLY_JOB and this job processes the rows from VIVEK_TEST Table using an approach which is widely implemented by the developers or the dba’s. This is single job processing the required rows and thus is a very slow and an inefficient approach. There are 7200 rows in this table to be processed and assuming that the business logic in the pl/sql takes atleast 1 second and then moves on to another row. Which means, at a minimum, each row to be processed will take atleast 1 second, thus the entire process will take atleast 7200 second (2 Hours). This is the code of a batch process :

create table vivek_test pctfree 50 pctused 50 as
select * from all_objects a
where rownum=1000 then
    commit;
    l_limit:=0;
  end if;
  end loop;
end;
/
exec nightly_batch;

Once the job is executed, you can monitor the progress of this batch process from another session. V$SESSION will report the progress in MODULE column, as posted by dbms_application_info procedure.

The changes required to parallize the process is pasted below. To monitor the progress and maintain the log of each of the CHUNKS, I have created a process_status table. One thing worth noticing is the PCTFREE and PCTUSED setting, which is set to 50. The reason I have this setting is that the number of chunks to be created is dependant on the number of extents in the table and I wanted multiple extents for my table. In a production environment, number of extents should not be an issue.

create table process_status (
process_name		varchar2(30),
job_id			number,
s_rowid			rowid,
e_rowid			rowid,
start_time		date,
end_time		date,
processed_rows		number,
completed_yn		varchar2(1),
status			varchar2(1000));

create or replace procedure parallel_job(chks in number) is
l_statement	varchar2(1000);
l_jobno		number;
l_retry		number;
l_status	   number;
l_chunkid	   number;
l_srowid	   rowid;
l_erowid	   rowid;
l_anyrows   	boolean;
l_errcode	   pls_integer;
begin

  execute immediate 'truncate table process_status';
  /* CREATE A PARALLEL TASK */
  dbms_parallel_execute.create_task('NIGHT_JOB');

  dbms_parallel_execute.create_chunks_by_rowid('NIGHT_JOB','VIVEK','VIVEK_TEST',true, chks);
 /* INSERT THE CHUNKS INTO PROCESS_STATUS TO LOG THE STATUS OF EACH JOB */
  insert into process_status(process_name, job_id, s_rowid, e_rowid)
  select 'NIGHT_JOB', chunk_id, start_rowid, end_rowid from user_parallel_execute_chunks
  where task_name='NIGHT_JOB' order by 1;
  commit;
  /* Run the Procedure here */
  LOOP
      -- Get a chunk to process; if there is nothing to process, then exit the 
      -- loop;
      DBMS_PARALLEL_EXECUTE.GET_ROWID_CHUNK('NIGHT_JOB',l_chunkid, l_srowid, l_erowid,l_anyrows);
      IF (l_anyrows = false) THEN EXIT; END IF;
      BEGIN
	l_statement:='nightly_batch('||l_chunkid||','||''''||l_srowid||''''||','||''''||l_erowid||''''||');';
	dbms_job.submit(l_jobno,l_statement);
	update process_status set completed_yn='P', start_time=sysdate where job_id=l_chunkid;
	commit;
        DBMS_PARALLEL_EXECUTE.SET_CHUNK_STATUS('NIGHT_JOB',l_chunkid,DBMS_PARALLEL_EXECUTE.PROCESSED);
      EXCEPTION WHEN OTHERS THEN
        DBMS_PARALLEL_EXECUTE.SET_CHUNK_STATUS('NIGHT_JOB', l_chunkid,
          DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR, SQLCODE, SQLERRM);
	  l_errcode:=sqlcode;
  	update process_status set completed_yn='F', end_time=sysdate, status='FAILED WITH ERROR '||l_errcode||'.' where job_id=l_chunkid;
      END;
  commit;
  END LOOP;
  dbms_parallel_execute.drop_task('NIGHT_JOB');
end;
/

The only change required in our NIGHTLY_BATCH is the introduction of INPUT parameters and update of the process_status at the completion or failure of a chunk.

create or replace procedure nightly_batch(jobid number, low_rowid rowid, high_rowid rowid) is
l_count		number:=0;
l_module	    varchar2(30);
l_limit		number:=0;
l_errcode	    pls_integer;
cursor c_main is
select rowid, owner, object_name, object_type from vivek_test
where nvl(processed_flag,'N')='N'
and	rowid between low_rowid and high_rowid;
begin
  for i in c_main
  loop
  l_count:=l_count+1;
  l_limit:=l_limit+1;
  l_module:=l_count||' out of 7200/40=180';
  dbms_application_info.set_module('PL_SQL',l_module);
  dbms_lock.sleep(1); /* Assuming Processing of a row takes 1 second */
  update vivek_test set processed_flag='Y' where rowid = i.rowid;
  if l_limit>=1000 then
    commit;
    l_limit:=0;
  end if;
  end loop;
  update process_status set completed_yn='Y', end_time=sysdate, processed_rows=l_count 
  where job_id = jobid;
  commit;
  exception when others then
  l_errcode:=sqlcode;
  update process_status set completed_yn='F', status='FAILED WITH '||l_errcode||'.' where job_id = jobid;
  commit;
end;
/
exec parallel_job(100);

Monitor the progress of this job and you would see a significant improvement as the job completes in less than 5 minutes, as against 2 Hours, and utilizes the hardware resources optimally.

Advertisements

About Vivek Sharma
I am an Oracle Professional from Mumbai (India). I blog on the issues that I feel is Interesting for my readers. Some of these are my real life examples, which I hope, you would find interesting. Comments are always a welcome. The Technical Observations & Views here are my own and not necessarily those of Oracle or its affiliates. These are purely based on my understandings, learnings and resolutions of various customer issues.

4 Responses to PL/SQL Parallelism made easier in 11g

  1. Guruswamy D says:

    Hi Vivek,

    It was nice to see you in action @ Sangam. I wish you get more hours during such seminars, i have already written heavily on this in feedback.

    I have 2 questions on this DBMS_PARALLEL_EXECUTE

    1)Can you please let me know whether there are any operating system level requirements to exploit this DBMS_PARALLEL_EXECUTE. i.e In case to enable parallel DML in 10g and before the Operating system should have more than 1 CPU and parallel processing ability. Do this feature also has such restrictions please let us know.
    2) Is this available as part of Standard Edition of 11g Release 2 or only in Enterprise edition.

    Thanks in advance.

    Like

  2. Gnanasambantham says:

    Hello Vivek,
    Thanks for the above solution.

    I have two questions.
    1. Can we run this code during Business hours in Production Environment?
    2. Am trying to use this code. But am getting junk values in start_rowid and end_rowid. Because of this, my job is failing. Can you please help me in this?

    Thanks in advance!!!

    Like

  3. AnsMirza says:

    Hi Vivek,

    Thanks for the great info !

    The DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID , indeed provides a very high degree of parallelism.
    But I have seen that there are many chunks with zero records .

    For example,
    If a table has 10K records,
    There are 100 chunks being created when the chunk size =>1000 ;

    But we have seen that only very few chunks have records in them and the 90% of chunks have 0 records.

    I presume that this is because of ROWID address locations scattered .

    This affects the load on the parallel jobs , when we have millions of records in a table .

    Have you seen this issue before ?

    Or is there any strategy where we can perform DBMS_PARALLEL_EXECUTE so that the chunks created have equal number of records.

    Thanks

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s