Difference between revisions of "UNICODE"

From Team Developer SqlWindows Wiki
Jump to: navigation, search
(Created new article about using the TD H/LIB files)
Line 4: Line 4:
 
<br>
 
<br>
  
{{TipHeader|Enter new tip title here}}
+
{{TipHeader|Using the Team Developer H/LIB files correctly}}
Enter new tip description here
+
If you are coding DLLs in C/C++ that should support the Team Developer functionality, you'll have to take care of some special definitions in the header files, that are shipped with the Team Developer installation.
 +
Because in some of the header files, the guys from Unify used the '''TCHAR'''-definitions for string parameters, even for functions that support UNICODE-strings only!
 +
 
 +
==Here is a short explanation, what TCHAR-definitions are:==
 +
As Windows started to support the UNICODE character set, a new datatype for representing characters was introduced: '''wchar_t'''
 +
Before that, '''char''' was the type of choice to store a single character, and as you might know, it was 1 byte in size.
 +
The new datatype '''wchar_t''' should be able to store more than 256 different characters, so it was defined as 2 bytes.
 +
This allowed the storage of 65536 different characters. Quite much!
 +
 
 +
But now, the designers of the system had another thing to do.
 +
While there were lots of string functions around, that deal with ANSI-strings (use the '''char''' datatype), they had to introduce another set of string functions to support the new UNICODE-strings ('''wchar_t''' datatype).
 +
 
 +
===Here is one example from the C runtime library:===
 +
<pre>
 +
size_t strlen( const char *string );
 +
size_t wcslen( const wchar_t *string );
 +
</pre>
 +
 
 +
The first function - '''strlen''' - supports ANSI-strings only and thus uses the '''char''' datatype.
 +
The second function - '''wcslen''' - supports UNICODE-strings only and thus uses the '''wchar_t''' datatype.
 +
 
 +
So, you have to decide if you want to deal with ANSI or with UNICODE strings.
 +
But what, if you plan to create something, that uses ANSI strings first, and later migrate to UNICODE strings?
 +
 
 +
===You have two choices:===
 +
# Code everything using the ANSI-string functions and modify your code later to use the UNICODE ones!
 +
# Use the TCHAR functions!
 +
 
 +
==But what the hell is TCHAR?==
 +
TCHAR is a type definition that was introduced to provide an easier way of supporting ANSI and UNICODE strings.
 +
BUT, not side by side! It's meant to be either ANSI or UNICODE!!!
 +
 
 +
As mentioned before, TCHAR is a type definition, not a real type.
 +
That means, if you compile your C/C++ project using the ANSI setting, TCHAR is mapped to char.
 +
If you compile the whole project with the UNICODE setting, TCHAR is mapped to wchar_t.
 +
While recompiling your project with different character set settings affects all the functions you wrote, it does definitively not for the functions you consume from external DLLs for example.
 +
 
 +
And this is the point, where the Team Developer header files come in!
 +
 
 +
In this article, I'd like to cover the usage of the SWinHStringLock function, defined in the centura.h header file.
 +
===It is defined in this way:===
 +
<pre>
 +
LPTSTR CBEXPAPI SWinStringGetBuffer(HSTRING, LPLONG);
 +
#define SWinHStringLock(hString,lplength) SWinStringGetBuffer((hString),(lplength))
 +
</pre>
 +
 
 +
That means, whenever you call SWinHStringLock, you really call SWinStringGetBuffer.
 +
The return value of SWinStringGetBuffer is defined as LPTSTR, which is a TCHAR-definition (visible through the first T in it's name).
 +
That means, that if you compile your project using the ANSI-setting, LPTSTR is mapped to LPSTR which points to a char-array.
 +
If you compile your project using the UNICODE-setting, LPTSTR is mapped to LPWSTR which points to a wchar_t-array.
 +
But what doesn't change is the name of the called function.
 +
 
 +
Now think about it!
 +
You call exactly the same function, that resides in the cdllixx.dll, which is not affected by your compiler settings, and expect it to return different string types!?!?
 +
That won't work!
 +
 
 +
In this case, SWinStringGetBuffer returns a pointer to a UNICODE string ALWAYS!
 +
So, even if you want to use ANSI-strings in your project and link against a TD UNICODE library, you'll get a pointer to a UNICODE string.
 +
The big problem is, that the centura.h definition doesn't reflect this by defining the return value as LPTSTR which may lead you to believe that an ANSI string gets returned. And in fact, this will make your code buggy!
 +
 
 +
So this is a warning! If you ever see a TCHAR-definition in any header file, think about it twice! Can you really expect that to work?
 +
 
 +
===Here is a short example, how you can solve the SWinStringGetBuffer problem (using the flexible TCHAR approach):===
 +
<pre>
 +
...
 +
// the string I want to work with later
 +
LPTSTR stringToWorkWith;
 +
 
 +
// get the string buffer from the HSTRING
 +
LONG stringBufferSizeFromHStringInBytes;
 +
LPWSTR stringFromHString = (LPWSTR)SWinStringGetBuffer(hString, &stringBufferSizeFromHStringInBytes);
 +
 
 +
LONG stringFromHStringCharacterCount = stringBufferSizeFromHStringInBytes / sizeof(WCHAR);
 +
 
 +
#ifndef _UNICODE
 +
// we want to use the ANSI string
 +
stringToWorkWith = new char[stringFromHStringCharacterCount];
 +
 
 +
// convert UNICODE to ANSI
 +
MultiByteToWideChar(CP_THREAD_ACP, 0, stringFromHString, stringFromHStringCharacterCount, stringToWorkWith, stringFromHStringCharacterCount, NULL, NULL);
 +
#else
 +
// we can use the string directly
 +
stringToWorkWith = stringFromHString;
 +
#endif
 +
 
 +
// TODO: use the string via stringToWorkWith
 +
 
 +
#ifndef _UNICODE
 +
// as we allocated memory, release that
 +
delete[] stringToWorkWith;
 +
#else
 +
// as we could use the string directly, we have nothing to release
 +
#endif
 +
...
 +
</pre>
 +
 
 +
I hope, this helps you to understand the problem!

Revision as of 10:47, 26 November 2010

This page covers UNICODE concerning TD5.x.

Contents


Pointer2.png Using the Team Developer H/LIB files correctly Pointer.png

If you are coding DLLs in C/C++ that should support the Team Developer functionality, you'll have to take care of some special definitions in the header files, that are shipped with the Team Developer installation. Because in some of the header files, the guys from Unify used the TCHAR-definitions for string parameters, even for functions that support UNICODE-strings only!

Here is a short explanation, what TCHAR-definitions are:

As Windows started to support the UNICODE character set, a new datatype for representing characters was introduced: wchar_t Before that, char was the type of choice to store a single character, and as you might know, it was 1 byte in size. The new datatype wchar_t should be able to store more than 256 different characters, so it was defined as 2 bytes. This allowed the storage of 65536 different characters. Quite much!

But now, the designers of the system had another thing to do. While there were lots of string functions around, that deal with ANSI-strings (use the char datatype), they had to introduce another set of string functions to support the new UNICODE-strings (wchar_t datatype).

Here is one example from the C runtime library:

size_t strlen( const char *string );
size_t wcslen( const wchar_t *string );

The first function - strlen - supports ANSI-strings only and thus uses the char datatype. The second function - wcslen - supports UNICODE-strings only and thus uses the wchar_t datatype.

So, you have to decide if you want to deal with ANSI or with UNICODE strings. But what, if you plan to create something, that uses ANSI strings first, and later migrate to UNICODE strings?

You have two choices:

  1. Code everything using the ANSI-string functions and modify your code later to use the UNICODE ones!
  2. Use the TCHAR functions!

But what the hell is TCHAR?

TCHAR is a type definition that was introduced to provide an easier way of supporting ANSI and UNICODE strings. BUT, not side by side! It's meant to be either ANSI or UNICODE!!!

As mentioned before, TCHAR is a type definition, not a real type. That means, if you compile your C/C++ project using the ANSI setting, TCHAR is mapped to char. If you compile the whole project with the UNICODE setting, TCHAR is mapped to wchar_t. While recompiling your project with different character set settings affects all the functions you wrote, it does definitively not for the functions you consume from external DLLs for example.

And this is the point, where the Team Developer header files come in!

In this article, I'd like to cover the usage of the SWinHStringLock function, defined in the centura.h header file.

It is defined in this way:

LPTSTR CBEXPAPI SWinStringGetBuffer(HSTRING, LPLONG);
#define SWinHStringLock(hString,lplength) SWinStringGetBuffer((hString),(lplength))

That means, whenever you call SWinHStringLock, you really call SWinStringGetBuffer. The return value of SWinStringGetBuffer is defined as LPTSTR, which is a TCHAR-definition (visible through the first T in it's name). That means, that if you compile your project using the ANSI-setting, LPTSTR is mapped to LPSTR which points to a char-array. If you compile your project using the UNICODE-setting, LPTSTR is mapped to LPWSTR which points to a wchar_t-array. But what doesn't change is the name of the called function.

Now think about it! You call exactly the same function, that resides in the cdllixx.dll, which is not affected by your compiler settings, and expect it to return different string types!?!? That won't work!

In this case, SWinStringGetBuffer returns a pointer to a UNICODE string ALWAYS! So, even if you want to use ANSI-strings in your project and link against a TD UNICODE library, you'll get a pointer to a UNICODE string. The big problem is, that the centura.h definition doesn't reflect this by defining the return value as LPTSTR which may lead you to believe that an ANSI string gets returned. And in fact, this will make your code buggy!

So this is a warning! If you ever see a TCHAR-definition in any header file, think about it twice! Can you really expect that to work?

Here is a short example, how you can solve the SWinStringGetBuffer problem (using the flexible TCHAR approach):

...
// the string I want to work with later
LPTSTR stringToWorkWith;

// get the string buffer from the HSTRING
LONG stringBufferSizeFromHStringInBytes;
LPWSTR stringFromHString = (LPWSTR)SWinStringGetBuffer(hString, &stringBufferSizeFromHStringInBytes);

LONG stringFromHStringCharacterCount = stringBufferSizeFromHStringInBytes / sizeof(WCHAR);

#ifndef _UNICODE
// we want to use the ANSI string
stringToWorkWith = new char[stringFromHStringCharacterCount];

// convert UNICODE to ANSI
MultiByteToWideChar(CP_THREAD_ACP, 0, stringFromHString, stringFromHStringCharacterCount, stringToWorkWith, stringFromHStringCharacterCount, NULL, NULL);
#else
// we can use the string directly
stringToWorkWith = stringFromHString;
#endif

// TODO: use the string via stringToWorkWith

#ifndef _UNICODE
// as we allocated memory, release that
delete[] stringToWorkWith;
#else
// as we could use the string directly, we have nothing to release
#endif
...

I hope, this helps you to understand the problem!