Data Loading in Oracle

Data loading concept
In our daily life we face such situation where we have to load a huge number of data into our oracle db,We do face lots of issue during those operations and also performance related issues as well. I am trying to address some of those issues as well as trying to give some description about few utility how they works.

Some available techniques are as follows: 
  • External table
  • Sqlldr
  • Utl_file
  • Insert APPEN Note: INSERT APPEND supports only the subquery syntax of the INSERT statement, not the VALUES clause
  • Partition exchange
  • Bulk loading using forall
  • Oracle Data Pump
  • Export Import
Question is which one is better performer in above mentioned list. Please find a diagram below relented to their performance.





So this chart representing that SQLLDR is one of the best way to load data with maximum performance.


SQL loader

SQL*Loader () is the utility to use for high performance data loads. The data can be loaded from any text file and inserted into the database.

SQL*Loader reads a data file and a description of the data which is defined in the control file. Using this information and any additional specified parameters (either on the command line or in the PARFILE), SQL*Loader loads the data into the database.

During processing, SQL*Loader writes messages to the log file, bad rows to the bad file, and discarded rows to the discard file.

Performence
It is very flexible and have several options that helps to load data at maximum speed.Below have some options available to SQLldr to boost it performance.

  • Use Direct Path LoadsThe conventional path loader essentially loads the data by using standard insert statements. The direct path loader (direct=true) loads directly into the Oracle data files and creates blocks in Oracle database block format. To prepare the database for direct path loads, the script$ORACLE_HOME/rdbms/admin/catldr.sql.sql must be executed.
  • Disable Indexes and Constraints. For conventional data loads only, the disabling of indexes and constraints can greatly enhance the performance of SQL*Loader. The skip_index_maintenance SQL*Loader parameter allows you to bypass index maintenance when performing parallel build data loads into Oracle, but only when using the sqlldr direct=y direct load options. Infact try to disable your referential integrity constraint as well

According to Dave More in his book “Oracle Utilities” using skip_index_maintenance=true means “don’t rebuild indexes”, and it will greatly speed-up sqlldr data loads when using parallel processes with sqlldr:

Also, according to Oracle expert Jonathan Gennick "Theskip_index_maintenance SQL*Loader parameter: “Controls whether or not index maintenance is done for a direct path load. This parameter does not apply to conventional path loads. A value of TRUE causes index maintenance to be skipped.

  • Use a Larger Bind Array. For conventional data loads only, larger bind arrays limit the number of calls to the database and increase performance. The size of the bind array is specified using the bindsize parameter. The bind array's size is equivalent to the number of rows it contains (rows=) times the maximum length of each row.
  • Increase the input data buffer - The sqlldr readsize parameter determines the input data buffer size used by SQL*Loader
  • Use ROWS=n to Commit Less Frequently. For conventional data loads only, rows specifies the number of rows per commit. Issuing fewer commits will enhance performance.
  • Use Parallel Loads. Available with direct path data loads only, this option allows multiple SQL*Loader jobs to execute concurrently. Note: You must be on an SMP server (cpu_count > 2 at least) to successfully employ parallelism, and you must also employ the append option, else you may get this error: "SQL*Loader-279: Only APPEND mode allowed when parallel load specified."

Note that you can also run SQL*Loader in parallel, and create parallel parallelism:

$ sqlldr control=first.ctl parallel=true direct=true
$ sqlldr control=second.ctl parallel=true direct=true

  • Use Fixed Width Data. Fixed width data format saves Oracle some processing when parsing the data. 
  • Disable Archiving During Load. While this may not be feasible in certain environments, disabling database archiving can increase performance considerably
  • Use unrecoverable. The unrecoverable option (unrecoverable load data) disables the writing of the data to the redo logs. This option is available for direct path loads only.



Q.What is SQL*Loader and what is it used for?
SQL*Loader is a bulk loader utility used for moving data from external files into the Oracle database. Its syntax is similar to that of the DB2 Load utility, but comes with more options. SQL*Loader supports various load formats, selective loading, and multi-table loads.
--------------------------------------------------------------------------------------------------------


Q. How does one use the SQL*Loader utility? 
One can load data into an Oracle database by using the sqlldr (sqlload on some platforms) utility. Invoke the utility without arguments to get a list of available parameters.

Look at the following example:
sqlldr user/password control=loader.ctl

This sample control file (loader.ctl) will load an external data file containing delimited or undelimited data:

load data
infile 'c:\data\mydata.csv'
into table emp ( empno, empname, sal, deptno )
fields terminated by "," optionally enclosed by '"'

The mydata.csv file may look like this:

10001,"Scott Tiger", 1000, 40
10002,"Frank Naude", 500, 20


Another Sample control file with in-line data formatted as fix length records. The trick is to specify "*" as the name of the data file, and use BEGINDATA to start the data section in the control file.

load data
infile *
replace
into table departments ( dept position (02:05) char(4), deptname position (08:27) char(20) )
begindata
COSC COMPUTER SCIENCE
ENGL ENGLISH LITERATURE
MATH MATHEMATICS
POLY POLITICAL SCIENCE
--------------------------------------------------------------------------------------------------------

Q. Is there a SQL*Unloaded to download data to a flat file? 
Oracle does not supply any data unload utilities. However, you can use SQL*Plus to select and format your data and then spool it to a file:

set echo off newpage 0 space 0 pagesize 0 feed off head off trimspool on

spool oradata.txt
select col1 || ',' || col2 || ',' || col3 from tab1 where col2 = 'XYZ';
spool off

Alternatively use the UTL_FILE PL/SQL package:

rem Remember to update initSID.ora, utl_file_dir='c:\oradata' parameter

declare
fp utl_file.file_type;
begin
fp := utl_file.fopen('c:\oradata','tab1.txt','w');
utl_file.putf(fp, '%s, %s\n', 'TextField', 55);
utl_file.fclose(fp);
end;
/
Note: Now in 11g oracle provide unload utility as well.
--------------------------------------------------------------------------------------------------------

Q. Can one load variable and fix length data records?
Yes, look at the following control file examples. In the first we will load delimited data (variable length):

LOAD DATA
INFILE *
INTO TABLE load_delimited_data
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS (data1, data2)
BEGINDATA
11111,AAAAAAAAAA
22222,"A,B,C,D,"

If you need to load positional data (fixed length), look at the following control file example:

LOAD DATA
INFILE *
INTO TABLE load_positional_data (data1 POSITION(1:5), data2 POSITION(6:15) )
BEGINDATA
11111AAAAAAAAAA
22222BBBBBBBBBB
--------------------------------------------------------------------------------------------------------

Q. Can one skip header records load while loading?
Use the "SKIP n" keyword, where n = number of logical rows to skip. Look at this example:

LOAD DATA
INFILE *
INTO TABLE load_positional_data
SKIP 5(data1 POSITION(1:5), data2 POSITION(6:15))
BEGINDATA
11111AAAAAAAAAA
22222BBBBBBBBBB
--------------------------------------------------------------------------------------------------------

Q. Can one modify data as it loads into the database? 
Data can be modified as it loads into the Oracle Database. Note that this only applies for the conventional load path and not for direct path loads.

LOAD DATA
INFILE *
INTO TABLE modified_data( rec_no "my_db_sequence.nextval", region CONSTANT '31', time_loaded "to_char(SYSDATE, 'HH24:MI')", data1 POSITION(1:5) ":data1/100", data2 POSITION(6:15) "upper(:data2)", data3 POSITION(16:22)"to_date(:data3, 'YYMMDD')" )
BEGINDATA
11111AAAAAAAAAA991201
22222BBBBBBBBBB990112



LOAD DATA
INFILE 'mail_orders.txt'
BADFILE 'bad_orders.txt'
APPEND
INTO TABLE mailing_list
FIELDS TERMINATED BY ","(addr, city, state, zipcode, mailing_addr "decode(:mailing_addr, null, :addr, :mailing_addr)", mailing_city "decode(:mailing_city, null, :city, :mailing_city)", mailing_state)
--------------------------------------------------------------------------------------------------------

Q. Can one load data into multiple tables at once? 
Look at the following control file:
LOAD DATA
INFILE *
REPLACE
INTO TABLE emp WHEN empno != ' ' ( empno POSITION(1:4) INTEGER EXTERNAL, ename POSITION(6:15) CHAR, deptno POSITION(17:18) CHAR, mgr POSITION(20:23) INTEGER EXTERNAL )
INTO TABLE proj WHEN projno != ' ' ( projno POSITION(25:27) INTEGER EXTERNAL, empno POSITION(1:4) INTEGER EXTERNAL )
--------------------------------------------------------------------------------------------------------

Q. Can one selectively load only the records that one need? 
Look at this example, (01) is the first character, (30:37) are characters 30 to 37:

LOAD DATA
INFILE 'mydata.dat' BADFILE 'mydata.bad' DISCARDFILE 'mydata.dis'
APPEND
INTO TABLE my_selective_tableWHEN (01) <> 'H' and (01) <> 'T' and (30:37) = '19991217'(region CONSTANT '31', service_key POSITION(01:11) INTEGER EXTERNAL, call_b_no POSITION(12:29) CHAR )
--------------------------------------------------------------------------------------------------------

Q. Can one skip certain columns while loading data?
One cannot use POSTION(x:y) with delimited data. Luckily, from Oracle 8i one can specify
FILLER columns. FILLER columns are used to skip columns/fields in the load file, ignoring fields that one does not want. Look at this example:

LOAD DATA
TRUNCATE INTO TABLE T1
FIELDS TERMINATED BY ',' ( field1, field2 FILLER, field3 )
--------------------------------------------------------------------------------------------------------

Q. How does one load multi-line records?
One can create one logical record from multiple physical records using one of the following two clauses:
CONCATENATE: - use when SQL*Loader should combine the same number of physical records together to form one logical record.
CONTINUEIF - use if a condition indicates that multiple records should be treated as one.
--------------------------------------------------------------------------------------------------------

Q. How can get SQL*Loader to COMMIT only at the end of the load file? 
One cannot, but by setting the ROWS= parameter to a large value, committing can be reduced.
Make sure you have big rollback segments ready when you use a high value for ROWS=.
--------------------------------------------------------------------------------------------------------

Q. Can one improve the performance of SQL*Loader?
A very simple but easily overlooked hint is not to have any indexes and/or constraints (primary key) on your load tables during the load process. This will significantly slow down load times even with ROWS= set to a high value.

Add the following option in the command line: DIRECT=TRUE. This will effectively bypass most of the RDBMS processing. However, there are cases when you can't use direct load. Refer to chapter 8 on Oracle server Utilities manual.

Turn off database logging by specifying the UNRECOVERABLE option. This option can only be used with direct data loads.

Run multiple load jobs concurrently.
--------------------------------------------------------------------------------------------------------

Q. What is the difference between the conventional and direct path loader? 
The conventional path loader essentially loads the data by using standard INSERT statements.

The direct path loader (DIRECT=TRUE) bypasses much of the logic involved with that, and loads directly into the Oracle data files. More information about the restrictions of direct path loading can be obtained from the Utilities Users Guide.


                                                                                                           To Be Continued


Comments

Popular posts from this blog

Oracle Interview Questions-Part-1 [Basic questions]