UNICODE

From Team Developer SqlWindows Wiki
Revision as of 09:43, 16 July 2013 by DaveRabelink (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

UNICODE in Team Developer


Contents


Pointer2.png Using the Team Developer H/LIB files correctly Pointer.png

If you are coding DLLs in C/C++ that should support the Team Developer functionality, you'll have to take care of some special definitions in the header files, that are shipped with the Team Developer installation. Because in some of the header files, the guys from Unify used the TCHAR-definitions for string parameters, even for functions that support UNICODE-strings only!


Here is a short explanation, what TCHAR-definitions are:


As Windows started to support the UNICODE character set, a new datatype for representing characters was introduced: wchar_t Before that, char was the type of choice to store a single character, and as you might know, it was 1 byte in size. The new datatype wchar_t should be able to store more than 256 different characters, so it was defined as 2 bytes. This allowed the storage of 65536 different characters. Quite much!


But now, the designers of the system had another thing to do. While there were lots of string functions around, that deal with ANSI-strings (use the char datatype), they had to introduce another set of string functions to support the new UNICODE-strings (wchar_t datatype).


Here is one example from the C runtime library:


size_t strlen( const char *string );
size_t wcslen( const wchar_t *string );


The first function - strlen - supports ANSI-strings only and thus uses the char datatype. The second function - wcslen - supports UNICODE-strings only and thus uses the wchar_t datatype.


So, you have to decide if you want to deal with ANSI or with UNICODE strings. But what, if you plan to create something, that uses ANSI strings first, and later migrate to UNICODE strings?


You have two choices:


  1. Code everything using the ANSI-string functions and modify your code later to use the UNICODE ones!
  2. Use the TCHAR functions!


But what the hell is TCHAR?


TCHAR is a type definition that was introduced to provide an easier way of supporting ANSI and UNICODE strings. BUT, not side by side! It's meant to be either ANSI or UNICODE!!!


As mentioned before, TCHAR is a type definition, not a real type. That means, if you compile your C/C++ project using the ANSI setting, TCHAR is mapped to char. If you compile the whole project with the UNICODE setting, TCHAR is mapped to wchar_t. While recompiling your project with different character set settings affects all the functions you wrote, it does definitively not for the functions you consume from external DLLs for example.


And this is the point, where the Team Developer header files come in!


In this article, I'd like to cover the usage of the SWinHStringLock function, defined in the centura.h header file.


It is defined in this way:


LPTSTR CBEXPAPI SWinStringGetBuffer(HSTRING, LPLONG);
#define SWinHStringLock(hString,lplength) SWinStringGetBuffer((hString),(lplength))


That means, whenever you call SWinHStringLock, you really call SWinStringGetBuffer. The return value of SWinStringGetBuffer is defined as LPTSTR, which is a TCHAR-definition (visible through the first T in it's name). That means, that if you compile your project using the ANSI-setting, LPTSTR is mapped to LPSTR which points to a char-array. If you compile your project using the UNICODE-setting, LPTSTR is mapped to LPWSTR which points to a wchar_t-array. But what doesn't change is the name of the called function.


Now think about it!
You call exactly the same function, that resides in the cdllixx.dll, which is not affected by your compiler settings, and expect it to return different string types!?!? That won't work!


In this case, SWinStringGetBuffer returns a pointer to a UNICODE string ALWAYS! So, even if you want to use ANSI-strings in your project and link against a TD UNICODE library, you'll get a pointer to a UNICODE string.
The big problem is, that the centura.h definition doesn't reflect this by defining the return value as LPTSTR which may lead you to believe that an ANSI string gets returned. And in fact, this will make your code buggy!


So this is a warning! If you ever see a TCHAR-definition in any header file, think about it twice! Can you really expect that to work?


Here is a short example, how you can solve the SWinStringGetBuffer problem (using the flexible TCHAR approach):


...
// the string I want to work with later
LPTSTR stringToWorkWith;

// get the string buffer from the HSTRING
LONG stringBufferSizeFromHStringInBytes;
LPWSTR stringFromHString = (LPWSTR)SWinStringGetBuffer(hString, &stringBufferSizeFromHStringInBytes);

LONG stringFromHStringCharacterCount = stringBufferSizeFromHStringInBytes / sizeof(WCHAR);

#ifndef _UNICODE
// we want to use the ANSI string
stringToWorkWith = new char[stringFromHStringCharacterCount];

// convert UNICODE to ANSI
WideCharToMultiByte(CP_THREAD_ACP, 0, stringFromHString, stringFromHStringCharacterCount, stringToWorkWith, stringFromHStringCharacterCount, NULL, NULL);
#else
// we can use the string directly
stringToWorkWith = stringFromHString;
#endif

// TODO: use the string via stringToWorkWith

#ifndef _UNICODE
// as we allocated memory, release that
delete[] stringToWorkWith;
#else
// as we could use the string directly, we have nothing to release
#endif
...


Well, this only describes how to get the string contents. If you want to modify a string, you'll have to convert your ANSI-string to UNICODE! I hope, this helps you to understand the problem!


Pointer2.png Team Developer ASCII API (using TDASCII) Pointer.png

Here the contents of the document on TD ASCII API.
The original document is part of the samples installation of TD and is located at:


Samples\SQLWindows\SalASCII\Team Developer ASCII  API.doc


This document can also be downloaded from here:
Down.png Team Developer ASCII API.doc


Team Developer ASCII API
This document is a brief overview of using Team Developer’s new ASCII API to minimize the impact of migrating legacy dll’s and executables written against our pre-Unicode API.
Alternately you might consider porting your dll’s to support wide char.


Internally Team Developer uses 16 bit characters from version 5.1 on.
The functions that are exposed via our API that have parameters and return values of char, char pointers etc. now use wchar_t versions for those parameters.
We have supplied ASCII versions of those functions together with defines and macro’s in our distributed header files so that your C and C++ projects do not have to be rewritten if you are remaining ASCII.
We are also supplying a sample application and dll to demonstrate usage.


Important Note:
If your application has merged TD distributed libraries with external dll sections you need to either copy and past the new versions of these sections or you need to go through the parameter list and make sure that LPTST parameter types are changed to LPWSTR types.


ASCII Function usage
There approximately two hundred functions with ASCII versions, these functions are not available from your Sal code (the compiler does not recognize them) they are only available from the external API. Functions that have HSTRING’s or LPHSTRING’s are assumed to have UTF-16 in the given buffers (see HSTRING section that follows). Where as functions with char parameter types are to have ASCII text (system local code page).


We distribute centura.h with our API function declarations that your C code is currently using. We have added #ifdef sections that allow you to add a define to your project and control what versions of the Sal API you are using. If you are to be Unicode and your strings are all in UTF-16 then nothing needs to be defined (this is default) If you want your project to compile using the ASCII versions of TD then define TDASCII in the project. In the centura header files we have our function declarations conditionally defined as follows …using SalStrToNumber( ) as an example.


#ifndef TDASCII
…
NUMBER CBEXPAPI SalStrToNumber(LPWSTR);
…
#else
NUMBER CBEXPAPI SalStrToNumberA(LPSTR);
#define SalStrToNumber SalStrToNumberA
…
#endif


As you can see now your code will not need to be changed and will work with the ASCII buffer that it had before.


Passing Text back and forth to TD runtime.


If your string data originates from the TD runtime then it is going to be in UTF-16 and if your code is doing manipulation assuming ASCII it will need to be “trans-coded” before it is used. We supply a new API function to change the encoding, SalStrToMultiByte( ) and a function to switch back to UTF-16 SalStrWideChar( ). Both take a HSTRING as the first parameter for input and a LPHSTRING as the second parameter for the trans-coded out put, the third parameter is the encoding to use. We will support all Microsoft supported code pages but as of beta only the system ANSI code page and UTF-8 are supported.


HSTRING Review
Team Developer HSTRING’s are TD’s optimized string handling mechanism that essentially is a handle to an internal table that contains a buffer where the string is stored. Since this is essentially a handle and an associated buffer it was allowed to be used as a binary buffer as well …Sal Picture functions for example. For this reason HSTRING buffer functions that get and set size remain as the buffer size NOT the number of characters.


Creating a HSTRING in your dll.
Using SWinInitLPHSTRINGParam( ) or SWinStringCreate( ) will take a reference to an HSTRING


1) For SWinInitLPHSTRINGParam( ) If HSTRING is non-zero it will decrement the ref-count, if the ref-count becomes zero it is deleted from the string table (deletes only if dynamic string table is referenced not the static string table)
2) For SWinStringCreate( ) the HSTRING parameter is set to zero so the ref count is not adjusted. This should only be used when adding a new HString not when changing an existing.
3) A new HSTRING is added to the table and the memory that it references is allocated (len parameter) the new HSTRING’s ref-count is set to one.
4) If passing back to TD runtime leave the ref-count at one otherwise call SalHStringUnRef( ) to decrement the ref count back to zero so the resources are freed.
5) SWinHStringRef( ) increments the ref count, this should be used with caution as leaving a HString with a ref count greater than zero will cause the resources to never be freed, both the HString in the table and the associated buffer.

Getting the buffer associated with a HSTRING
1) Use SWinHStringLock( ) or SWinStringGetBuffer( ) and cast to the pointer type that you are using. Of course the buffer must contain the correct type of data.
2) SWindHStringUnlock is historic and is no longer needed, it has been defined as a no-op for backward compatibility.


Ordinals
If your application is loading cdlli51.dll dynamically you should not use the ordinals to get the function pointers, rather use the function names so that the substituted A versions will be loaded.


SalASCII Sample
Please see sample Sal application together with the C dll written to illustrate what was covered in this document.