Data Loading [Sqlldr Direct Path] Part -2

SQL*Loader with different loading options:

SQL*Loader have to methods for loading data as mention below: 

  • Conventional path
  • Direct path
But we normally use conventional path in our  daily life, Direct path load is a very good option oracle provides to increase loading performance.

Pre-requisite: The target table where this utility will load data that must be there at targeted schema, As SQL* loader never create tables it only help to load data in existing table. And also the user for which you are executing this process he have to have INSERT and DELETE privileged.

So in conventional way once we execute sqlldr command it internally process sql commands[Insert with bind varray buffer] to load it and for that it start searching for those blocks which are with in HWM means search for partial fill blocks or those blocks which are still vacant due to delete activity, Moral is it does not increase HWM instead it will search unused block with in HWM. And due to this activity it takes more time than Direct path loading where it increases the HWM. To understand HWM [High Water Mark please go through my earlier blog as reference]  Not only that it has some other draw back as well which cause more time consumption at conventional way.you will find the Direct path loading process architecture at oracle side.
So we will see some advantage of Direct path loading methods here:

Advantages of Direct=TRUE clause


  • As discussed partial blocks are not used, so no reads are needed to find them which helps to reduce process time, and fewer writes are performed 
  • As we all know in conventional method oracle internally fires INSERT for loading data where as in this case it works with oracle API which does not requires those extra processing time due to sql command
  • Conventional path load calls Oracle once for each array of rows to process a SQL INSERT statement. Where as in direct path load calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished
  • In direct path load this process use their own write I/O instead of Oracle buffer cache. This minimizes contention with other Oracle users.
  • It bypass all logging methodology to save time,So no time require to log the load as may be oracle is in NOARCHIVELOG mode or in UNRECOVERABLE mode etc.
But it has some restrictions as well 
  • We can't use clustered table
  • Prior to 9i version you can only perform a SQL*Loader direct path load when the client and server are the same version.
  • Targeted table should not have any existing lock on it or it will not works
  • You can't load parent and child table together in a single load.
  • If targeted table is partitioned and it has global index on it at that case you cannot use direct path load
  • Enabled triggers are not allowed.
  • Enabled referential and check constraints on the table that the partition is a member of are not allowed.

Example - Simple conventional path Load – with 'BINDSIZE' and 'ROWS' option

  • BINDSIZE specifies the size of the bind array n bytes.
  • READSIZE is use at the time of the read data .
Note: The maximum value for BINDSIZE is 20971520.

Here I am giving an example in which total row count=1021488 and
I will run sqlldr without Bindsize and readsize and also with those options,One can easily identify the performance details from them.

Without
  • Command:   sqlldr load_test/oracle control=control.ctl skip=1 data=Drt_ld_test.csv log=D:\Load_test\Load_tes_log.log
  • Count = 1021488 
  • Time Taken =

                                          Elapsed time was:     00:00:27.24

                                           CPU time was:         00:00:05.72




 With  ROWS  and  Bindsize

  • Command:   sqlldr load_test/oracle control=control.ctl skip=1 data=Drt_ld_test.csv log=D:\Load_test\Load_tes_log.log bindsize=8194304 rows=40000
  • Count = 1021488 
  • Time Taken =

                                            Elapsed time was:     00:00:05.82

                                              CPU time was:         00:00:02.92



Any one can understand clearly the difference between this two method.
Note:
BINDSIZE =  Size of conventional path bind array in bytes  (Default 256000)
ROWS = Number of rows in conventional path bind array Default is 64




DIRECT PATH
Now I will give a example of direct path load will see how much performance benefit we will get from this feature.


  • Command:   sqlldr load_test/oracle control=control.ctl skip=1 data=Drt_ld_test.csv log=D:\Load_test\Load_tes_log.log  streamsize=1048576 readsize=1048576  columnarrayrows=8000 multithreading=true direct=true
  • Count = 1021488 
  • Time Taken =

                                          Elapsed time was:     00:00:03.58

                                           CPU time was:         00:00:01.05


Time become half of previous load and one can understand the difference far better if you load more data


  • Command:   sqlldr load_test/oracle control=control.ctl skip=1 data=Drt_ld_test.csv log=D:\Load_test\Load_tes_log.log  streamsize=1048576 readsize=1048576  columnarrayrows=8000 multithreading=true direct=true
  • Count = 1921488 
  • Time Taken =

                                          Elapsed time was:     00:00:02.08

                                           CPU time was:         00:00:01.03

Here you can see I increased data count although it is taking very less time to load the data. The performance also depends on your hardware configuration like your CPU details and RAM.
Entire example which I shown here all has taken in a local desktop with 2 CPUs and 2 GB RAM.
And if you use parallel=true with multiple data files to load your data in multiple thread it become much faster than the method I have described above.

A correct value of  ROWS, READSIZE, STREAMSIZE, and COLUMNARRAYROWS according to your environment can change your load performance dramatically.


Comments

Popular posts from this blog

Oracle Interview Questions-Part-1 [Basic questions]

Data Loading in Oracle