4 Internationalization, Localization, and Unicode : Unicode and Non-Unicode ODBC Drivers

Unicode and Non-Unicode ODBC Drivers
The way in which a driver handles function calls from a Unicode application determines whether it is considered a "Unicode driver."
Function Calls
Instead of the standard ANSI SQL function calls, such as SQLConnect, Unicode applications use "W" (wide) function calls, such as SQLConnectW. If the driver is a true Unicode driver, it can understand "W" function calls and the Driver Manager can pass them through to the driver without conversion to ANSI. The DataDirect Connect Series for ODBC drivers that support "W" function calls are:
If the driver is a non-Unicode driver, it cannot understand W function calls, and the Driver Manager must convert them to ANSI calls before sending them to the driver. The Driver Manager determines the ANSI encoding system to which it must convert by referring to a code page. On Windows, this reference is to the Active Code Page. On UNIX and Linux, it is to the IANAAppCodePage connection string attribute, part of the odbc.ini file.
The following examples illustrate these conversion streams for the DataDirect Connect Series for ODBC drivers. The Driver Manager on UNIX and Linux prior to the DataDirect Connect Series for ODBC Release 5.0 assumes that Unicode applications and Unicode drivers use the same encoding (UTF-8). For the DataDirect Connect Series for ODBC Release 5.0 and higher on UNIX and Linux, the Driver Manager determines the type of Unicode encoding of both the application and the driver, and performs conversions when the application and driver use different types of encoding. This determination is made by checking two ODBC environment attributes: SQL_ATTR_APP_UNICODE_TYPE and SQL_ATTR_DRIVER_UNICODE_TYPE. “Driver Manager and Unicode Encoding on UNIX/Linux” describes in detail how this is done.
Unicode Application with a Non-Unicode Driver
An operation involving a Unicode application and a non-Unicode driver incurs more overhead because function conversion is involved.
Windows
1
2
The Driver Manager converts the function calls from UCS‑2/UTF-16 to ANSI. The type of ANSI is determined by the Driver Manager through reference to the client machine’s Active Code Page.
3
4
5
UNIX and Linux: DataDirect Connect® Series for ODBC Releases Prior to 5.0
1
2
The Driver Manager converts the function calls from UTF-8 to ANSI. The type of ANSI is determined by the Driver Manager through reference to the client machine’s value for the IANAAppCodePage connection string attribute.
3
The Driver Manager sends the converted ANSI function calls to the non-Unicode driver.
4
5
The Driver Manager converts the function calls from ANSI to UTF-8 and returns these converted calls to the application.
UNIX and Linux: DataDirect Connect® Series for ODBC 5.0 and Higher
1
The Unicode application sends function calls to the Driver Manager. The Driver Manager expects these function calls to be UTF-8 or UTF-16 based on the value of the SQL_ATTR_APP_UNICODE_TYPE attribute.
2
The Driver Manager converts the function calls from UTF-8 or UTF-16 to ANSI. The type of ANSI is determined by the Driver Manager through reference to the client machine’s value for the IANAAppCodePage connection string attribute.
3
4
5
The Driver Manager converts the function calls from ANSI to UTF-8 or UTF-16 and returns these converted calls to the application.
Unicode Application with a Unicode Driver
An operation involving a Unicode application and a Unicode driver that use the same Unicode encoding is efficient because no function conversion is involved. If the application and the driver each use different types of encoding, there is some conversion overhead. See “Driver Manager and Unicode Encoding on UNIX/Linux” for details.
Windows
1
2
The Driver Manager does not have to convert the UCS‑2/UTF‑16 function calls to ANSI. It passes the Unicode function call to the Unicode driver.
3
4
UNIX and Linux: DataDirect Connect® Series for ODBC Releases Prior to 5.0
1
2
The Driver Manager does not have to convert the UTF-8 function calls to ANSI. It passes the Unicode function call with UTF-8 arguments to the Unicode driver.
3
4
The Driver Manager returns UTF-8 function calls to the application.
UNIX and Linux: DataDirect Connect® Series for ODBC 5.0 and Higher
1
The Unicode application sends function calls to the Driver Manager. The Driver Manager expects these function calls to be UTF-8 or UTF-16 based on the value of the SQL_ATTR_APP_UNICODE_TYPE attribute.
2
The Driver Manager passes Unicode function calls to the Unicode driver. The Driver Manager has to perform function call conversions if the SQL_ATTR_APP_UNICODE_TYPE is different from the SQL_ATTR_DRIVER_UNICODE_TYPE.
3
The driver returns argument values to the Driver Manager. Whether these are UTF-8 or UTF-16 argument values is based on the value of the SQL_ATTR_DRIVER_UNICODE_TYPE attribute.
4
The Driver Manager returns appropriate function calls to the application based on the SQL_ATTR_APP_UNICODE_TYPE attribute value. The Driver Manager has to perform function call conversions if the SQL_ATTR_DRIVER_UNICODE_TYPE value is different from the SQL_ATTR_APP_UNICODE_TYPE value.
Data
ODBC C data types are used to indicate the type of C buffers that store data in the application. This is in contrast to SQL data types, which are mapped to native database types to store data in a database (data store). ANSI applications bind to the C data type SQL_C_CHAR and expect to receive information bound in the same way. Similarly, most Unicode applications bind to the C data type SQL_C_WCHAR (wide data type) and expect to receive information bound in the same way. Any ODBC 3.5-compliant Unicode driver must be capable of supporting SQL_C_CHAR and SQL_C_WCHAR so that it can return data to both ANSI and Unicode applications.
When the driver communicates with the database, it must use ODBC SQL data types, such as SQL_CHAR and SQL_WCHAR, that map to native database types. In the case of ANSI data and an ANSI database, the driver receives data bound to SQL_C_CHAR and passes it to the database as SQL_CHAR. The same is true of SQL_C_WCHAR and SQL_WCHAR in the case of Unicode data and a Unicode database.
When data from the application and the data stored in the database differ in format, for example, ANSI application data and Unicode database data, conversions must be performed. The driver cannot receive SQL_C_CHAR data and pass it to a Unicode database that expects to receive a SQL_WCHAR data type. The driver or the Driver Manager must be capable of converting SQL_C_CHAR to SQL_WCHAR, and vice versa.
The simplest cases of data communication are when the application, the driver, and the database are all of the same type and encoding, ANSI-to-ANSI-to-ANSI or Unicode-to-Unicode-to-Unicode. There is no data conversion involved in these instances.
When a difference exists between data types, a conversion from one type to another must take place at the driver or Driver Manager level, which involves additional overhead. The type of driver determines whether these conversions are performed by the driver or the Driver Manager. “Driver Manager and Unicode Encoding on UNIX/Linux” describes how the Driver Manager determines the type of Unicode encoding of the application and driver.
The following sections discuss two basic types of data conversion in the DataDirect Connect Series for ODBC drivers and the Driver Manager. How an individual driver exchanges different types of data with a particular database at the database level is beyond the scope of this discussion.
Unicode Driver
The Unicode driver, not the Driver Manager, must convert SQL_C_CHAR (ANSI) data to SQL_WCHAR (Unicode) data, and vice versa, as well as SQL_C_WCHAR (Unicode) data to SQL_CHAR (ANSI) data, and vice versa.
The driver must use client code page information (Active Code Page on Windows and IANAAppCodePage attribute on UNIX/Linux) to determine which ANSI code page to use for the conversions. The Active Code Page or IANAAppCodePage must match the database default character encoding; if it does not, conversion errors are possible.
ANSI Driver
The Driver Manager, not the ANSI driver, must convert SQL_C_WCHAR (Unicode) data to SQL_CHAR (ANSI) data, and vice versa (see “Unicode Support in ODBC” for a detailed discussion). This is necessary because ANSI drivers do not support any Unicode ODBC types.
The Driver Manager must use client code page information (Active Code Page on Windows and the IANAAppCodePage attribute on UNIX/Linux) to determine which ANSI code page to use for the conversions. The Active Code Page or IANAAppCodePage must match the database default character encoding. If not, conversion errors are possible.
Default Unicode Mapping
The default Unicode mapping for an application’s SQL_C_WCHAR variable is:
Connection Attribute for Unicode
If you do not want to use the default Unicode mappings for SQL_C_WCHAR, a connection attribute is available to override the default mappings. This attribute determines how character data is converted and presented to an application and the database.
Sets the SQL_C_WCHAR type for parameter and column binding to the Unicode type, either SQL_DD_CP_UTF16 (default for Windows) or SQL_DD_CP_UTF8 (default for UNIX/Linux).
You can set this attribute before or after you connect. After this attribute is set, all conversions are made based on the character set specified.
For example:
rc = SQLSetConnectAttr (hdbc, SQL_ATTR_APP_WCHAR_TYPE, (void *)SQL_DD_CP_UTF16, SQL_IS_INTEGER);
SQLGetConnectAttr and SQLSetConnectAttr for the SQL_ATTR_APP_WCHAR_TYPE attribute return a SQL State of HYC00 for drivers that do not support Unicode.
This connection attribute and its valid values can be found in the file qesqlext.h, which is installed with the product.
NOTE: For the SQL Server Legacy Wire Protocol driver, this attribute is supported only on UNIX and Linux, not on Windows.