Table 3-8 and
Table 3-9 describe the bulk load behavior for the drivers.
■
|
Because a bulk load operation may bypass data integrity checks, your application must ensure that the data it is transferring does not violate integrity constraints in the database. For example, suppose you are bulk loading data into a database table and some of that data duplicates data stored as a primary key, which must be unique. The driver will not throw an exception to alert you to the error; your application must provide its own data integrity checks.
|
Bulk load operations are accomplished by exporting the results of a query from a database into a comma-separated value (CSV) file, a bulk load data file. The driver then loads the data from bulk load data file into a different database. The file can be used by any DataDirect Connect Series
for ODBC drivers. In addition, the bulk load data file is supported by other DataDirect Connect product lines that feature bulk loading, for example, a DataDirect Connect
for ADO.NET data provider that supports bulk load.
Applications that are already coded to use parameter array batch functionality can leverage DataDirect Bulk Load features through the Enable Bulk Load connection option on the Bulk tab of the Driver setup dialog. Enabling this option automatically converts the parameter array batch operation to use the database bulk load protocol without any code changes to your application.
If you are not using parameter array batch functionality, the bulk operation buttons Export Table and
Load Table on the Bulk tab of the driver Setup dialog also allow you to use bulk load functionality without any code changes. See the individual driver chapters for a description of the Bulk tab.
From the DataDirect driver Setup dialog, select the Bulk tab and click Export Table. See the individual driver chapters for a description of this procedure.
Your application can export a table using the DataDirect functions ExportTableToFile (ANSI application) or ExportTableToFileW (Unicode application). The application must first obtain driver connection handles and function pointers, as shown in the following example:
This handle must be used when calling directly into the driver. */
ODBC_error (henv, hdbc, SQL_NULL_HSTMT);
ODBC_error (henv, hdbc, SQL_NULL_HSTMT);
resolveName (hmod, "ExportTableToFile");
printf ("Cannot find ExportTableToFile!\n");
(const SQLCHAR *) tableName,
(const SQLCHAR *) fileName,
errorTolerance, warningTolerance,
(const SQLCHAR *) logFile);
printf ("Export succeeded.\n");
driverError (driverHandle, hmod);
Refer to “Sample Bulk Load Configuration File” in
Chapter 9 of the
DataDirect Connect Series for ODBC Reference for a full description of these functions.
The export operation creates a bulk load data file with a .csv extension in which the exported data is stored. For example, assume that an Oracle source table named GBMAXTABLE contains four columns. The resulting bulk load data file GBMAXTABLE.csv containing the results of a query would be similar to the following:
In addition, a log file of events as well as external overflow files can be created during a bulk export operation. The log file is configured through either the driver Setup dialog Bulk tab, the ExportTableToFile function, or the SQL_BULK_EXPORT statement attribute. The external overflow files are configured through connection options; see
“External Overflow Files” for details.
The Enable Bulk Load connection option specifies the method by which bulk data is loaded to a database. When the option is enabled, the driver uses database bulk load protocols. When not enabled, the driver uses standard parameter arrays.
You can load data from the bulk load data file into the target database through the DataDirect driver Setup dialog by selecting the Bulk tab and clicking
Load Table. See the individual driver chapters of the drivers that support bulk load for a description of this procedure.
Your application can also load data from the bulk load data file into the target database using the using the DataDirect functions LoadTableFromFile (ANSI application) or LoadTableFromFileW (Unicode application). The application must first obtain driver connection handles and function pointers, as shown in the following example:
This handle must be used when calling directly into the driver. */
ODBC_error (henv, hdbc, SQL_NULL_HSTMT);
ODBC_error (henv, hdbc, SQL_NULL_HSTMT);
resolveName (hmod, "LoadTableFromFile");
printf ("Cannot find LoadTableFromFile!\n");
(const SQLCHAR *) tableName,
(const SQLCHAR *) fileName,
errorTolerance, warningTolerance,
(const SQLCHAR *) configFile,
(const SQLCHAR *) logFile,
(const SQLCHAR *) discardFile,
printf ("Load succeeded.\n");
driverError (driverHandle, hmod);
Refer to “Sample Bulk Load Configuration File” in
Chapter 9 of the
DataDirect Connect Series for ODBC Reference for a full description of these functions.
Use the BulkLoadBatchSize connection attribute to specify the number of rows the driver loads to the data source at a time when bulk loading data. Performance can be improved by increasing the number of rows the driver loads at a time because fewer network round trips are required. Be aware that increasing the number of rows that are loaded also causes the driver to consume more memory on the client.
A log file of events as well as a discard file that contains rows rejected during the load can be created during a bulk load operation. These files are configured through either the driver Setup dialog Bulk tab or the LoadTableFromFile function.
The discard file is in the same format as the bulk load data file. After fixing reported issues in the discard file, the bulk load can be reissued using the discard file as the bulk load data file.
NOTE FOR SYBASE USERS: Additional database configuration is required for destination tables that do not have an index. See the "Persisting a Result Set as an XML Data File" section in your driver chapter for more information.
NOTE FOR SALESFORCE USERS: In addition to bulk Insert, the Salesforce driver also supports bulk Delete, Update, and Upsert. This functionality is enabled with the SetBulkOperation function which is implemented in the driver. Refer to "DataDirect Bulk Load Functions" in Chapter 9 of the
DataDirect Connect Series for ODBC Reference for a full description of these functions.
A bulk load configuration file is created when either a table or a result set is exported to a bulk load data file. This file has the same name as the bulk load data file, but with an .xml extension.
The bulk load configuration file defines in its metadata the names and data types of the columns in the bulk load data file. The file defines these names and data types based on the table or result set created by the query that exported the data.
It also defines other data properties, such as length for character and binary data types, the character encoding code page for character types, precision and scale for numeric types, and nullablity for all types.
<column datatype="DECIMAL" precision="38" scale="0" nullable=
"false">INTEGERCOL</column>
<column datatype="VARBINARY" length="10" nullable=
"true">VARBINCOL</column>
<column datatype="VARCHAR" length="10" sourcecodepage="Windows-1252"
externalfilecodepage="Windows-1252" nullable="true">VCHARCOL</column>
<column datatype="VARCHAR" length="10" sourcecodepage="Windows-1252"
externalfilecodepage="Windows-1252" nullable="true">UNIVCHARCOL</column>
The bulk load configuration file must conform to the bulk load configuration XML schema. Each bulk export operation generates a bulk load configuration file in UTF-8 format. If the bulk load data file cannot be created or does not comply with the XML Schema described in the bulk load configuration file, an error is generated.
You can verify the metadata in the configuration file against the data structure of the target database table. This insures that the data in the bulk load data file is compatible with the target database table structure.
The verification does not check the actual data in the bulk load data file, so it is possible that the load can fail even though the verification succeeds. For example, if you were to update the bulk load data file manually such that it has values that exceed the maximum column length of a character column in the target table, the load would fail.
Not all of the error messages or warnings that are generated by verification necessarily mean that the load will fail. Many of the messages simply notify you about possible incompatibilities between the source and target tables. For example, if the bulk load data file has a column that is defined as an integer and the column in the target table is defined as smallint, the load may still succeed if the values in the source column are small enough that they fit in a smallint column.
To verify the metadata in the bulk load configuration file through the DataDirect driver Setup dialog, select the Bulk tab and click
Verify. See the individual driver chapters of the drivers that support bulk load for a description of this procedure.
Your application can also verify the metadata of the bulk load configuration file using the DataDirect functions ValidateTableFromFile (ANSI application) or ValidateTableFromFileW (Unicode application). The application must first obtain driver connection handles and function pointers, as shown in the following example:
This handle must be used when calling directly into the driver. */
ODBC_error (henv, hdbc, SQL_NULL_HSTMT);
ODBC_error (henv, hdbc, SQL_NULL_HSTMT);
resolveName (hmod, "ValidateTableFromFile");
printf ("Cannot find ValidateTableFromFile!\n");
(const SQLCHAR *) tableName,
(const SQLCHAR *) configFile,
(numMessages == 0) ? "s" :
((numMessages == 1) ? " : " : "s : "),
(numMessages > 0) ? messageList : "");
printf ("Validate succeeded.\n");
driverError (driverHandle, hmod);
Refer to “Sample Bulk Load Configuration File” in
Chapter 9 of the
DataDirect Connect Series for ODBC Reference for a complete description of these functions.
Progress DataDirect provides a sample application that demonstrates the bulk export, verification, and bulk load operations. This application is located in the \samples\bulk subdirectory of the product installation directory along with a text file named bulk.txt. Please consult bulk.txt for instructions on using the sample bulk load application.
A bulk streaming application is also provided in the \samples\bulkstrm subdirectory along with a text file named bulkstrm.txt. Please consult bulkstrm.txt for instructions on using the bulk streaming application.
Bulk operations can be performed using a dedicated bulk load protocol, that is, the protocol of the underlying database system, or by using parameter array batch operations. Dedicated protocols are generally more performance-efficient than parameter arrays. In some cases, however, you must use parameter arrays, for example, when the data to be loaded is in a data type not supported by the dedicated bulk protocol.
The Enable Bulk Load connection option determines bulk load behavior. When the option is enabled, the driver uses database bulk load protocols unless it encounters a problem, in which case it returns an error. In this situation, you must disable Enable Bulk Load so that the driver uses standard parameter arrays.
It is most performance-efficient to transfer data between databases that use the same character sets. At times, however, you might need to bulk load data between databases that use different character sets. You can do this by choosing a character set for the bulk load data file that will accommodate all data. If the source table contains character data that uses different character sets, then one of the Unicode character sets, UTF-8, UTF-16BE, or UTF-16LE must be specified for the bulk load data file. A Unicode character set should also be specified in the case of a target table uses a different character set than the source table to minimize conversion errors. If the source and target tables use the same character set, that set should be specified for the bulk load data file.
A character set is defined by a code page. The code page for the bulk load data file is defined in the configuration file and is specified through either the Code Page option of the Export Table driver Setup dialog or through the IANAAppCodePage parameter of the ExportTableToFile function. Any code page listed in
Chapter 1 “Code Page Values”of the
DataDirect Connect Series for ODBC Reference is supported for the bulk load data file.
Any character conversion errors are handled based on the value of the Report CodePage ConversionErrors connection option. See the individual driver chapters for a description of this option.
The configuration file may optionally define a second code page value for each character column (
externalfilecodepage). If character data is stored in an external overflow file (see
“External Overflow Files”), this second code page value is used for the external file.
In addition to the bulk load data file, DataDirect Bulk Load can store bulk data in external overflow files. These overflow files are located in the same directory as the bulk load data file. Whether or not to use external overflow files is a performance consideration. For example, binary data is stored as hexadecimal-encoded character strings in the main bulk load data file, which increases the size of the file per unit of data stored. External files do not store binary data as hex character strings, and, therefore, require less space. On the other hand, more overhead is required to access external files than to access a single bulk load data file, so each bulk load situation must be considered individually.
The value of the Bulk Binary Threshold connection option determines the threshold, in KB, over which binary data is stored in external files instead of in the bulk load data file. Likewise, the Bulk Character Threshold connection option determines the threshold for character data.
In the case of an external character data file, the character set of the file is governed by the bulk load configuration file. If the bulk load data file is Unicode and the maximum character size of the source data is 1, then the data is stored in its source code page. See
“Character Set Conversions”.
The file name of the external file contains the bulk load data file name, a six-digit number, and a ".lob" extension in the following format:
CSVfilename_nnnnnn.lob. Increments start at 000001.lob.
For all drivers that support bulk operations, except the Salesforce driver, the driver uses the native bulk load protocol for database connections when the Enable Bulk Load connection option is set to
true. For example, if you set the Enable Bulk Load connection option to
true, the driver would use bulk load for the native parameter array insert request.
In some cases, the driver may not be able to use bulk load because of restrictions enforced by the bulk load protocol and will downgrade to a batch mechanism. For example, if the data being loaded has a data type that is not supported by the bulk load protocol, the driver cannot use bulk load, but will use the native parameter array insert mechanism instead.
For the Salesforce driver, when the Enable Bulk Load connection option is set to true and the number of rows to be inserted in the batch is larger than Bulk Load Threshold, the driver uses the Salesforce Bulk API instead of the Web service API.
For all drivers that support bulk operations, use the Bulk Load Batch Size connection option to specify the number of rows the driver loads at a time when bulk loading data. Performance can be improved by increasing the number of rows the driver loads at a time because fewer network round trips are required. Be aware that increasing the number of rows that are loaded also causes the driver to consume more memory on the client.
When the Enable Bulk Load connection option is set to 1, the driver uses the Salesforce Bulk API for single Insert, Update, and Delete statements if the number of rows affected by the operation exceeds the threshold set by the Bulk Load Threshold connection property.
For example, if you set the Enable Bulk Load connection option to 1 and the Bulk Load Threshold connection option to 2000, executing the following statement would use the Bulk API if the number of rows returned by
SELECT rowid, sys_name FROM account is more than 2000 rows.
Table 3-10 summarizes how DataDirect Bulk Load related connection options work with the drivers. See "Connection Option Descriptions" in each driver chapter for details about configuring the options.