|
Home > Archive > SQL Anywhere database > December 2005 > multibyte sort order problems (japanese)
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
multibyte sort order problems (japanese)
|
|
| Ingo Eichinger 2005-12-16, 9:23 am |
| Hi all,
we received complaints from our customers in Japan that the
order of Japanese words is not satisfactory (using ASA
7.0.4.3519).
They say that the sort order seems to be case sensitive in
any case (no matter whether the database was initially
created as a case sensitive DB or a case insensitive DB), so
the order is something like 'ABCDEFabcdef' and should rather
be 'AaBbCcDdEeFf'.
We always use the default collation, resulting in a "932JPN"
on their japanese Windows XP.
I've already read about sort order problems in UTF-8 (but
not for other multibyte charsets and collations).
Is there anything we can do to provide a DB-based solution
to our customers?
TIA and kind regards,
Ingo
| |
| Pavel Karady 2005-12-19, 11:23 am |
| Writing a script will always help.
You can create a stored procedure which you will implement some sorting
algorithm in. But if your sorting is a result of ORDER BY clause (and I
guess that's what you are speaking about), then it gets a little more
complicated - you need to work with temporary tables, what can slow down
computing of the result set.
Before reading your post, I didn't even know that in japanese, there is
upper and lower case :)
Pavel
<Ingo Eichinger> wrote in message
news:43a2dc2e.37bd.1681692777@sybase.com...
> Hi all,
>
> we received complaints from our customers in Japan that the
> order of Japanese words is not satisfactory (using ASA
> 7.0.4.3519).
> They say that the sort order seems to be case sensitive in
> any case (no matter whether the database was initially
> created as a case sensitive DB or a case insensitive DB), so
> the order is something like 'ABCDEFabcdef' and should rather
> be 'AaBbCcDdEeFf'.
> We always use the default collation, resulting in a "932JPN"
> on their japanese Windows XP.
> I've already read about sort order problems in UTF-8 (but
> not for other multibyte charsets and collations).
>
> Is there anything we can do to provide a DB-based solution
> to our customers?
>
> TIA and kind regards,
> Ingo
| |
| John Smirnios 2005-12-20, 9:23 am |
| It sounds like your customers are looking for what is called multi-level
sorts but ASA collations do not handle that. To get that behaviour, you
must use the SORTKEY() function -- consult your documentation for how to
use it.
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
Ingo Eichinger wrote:
> Hi all,
>
> we received complaints from our customers in Japan that the
> order of Japanese words is not satisfactory (using ASA
> 7.0.4.3519).
> They say that the sort order seems to be case sensitive in
> any case (no matter whether the database was initially
> created as a case sensitive DB or a case insensitive DB), so
> the order is something like 'ABCDEFabcdef' and should rather
> be 'AaBbCcDdEeFf'.
> We always use the default collation, resulting in a "932JPN"
> on their japanese Windows XP.
> I've already read about sort order problems in UTF-8 (but
> not for other multibyte charsets and collations).
>
> Is there anything we can do to provide a DB-based solution
> to our customers?
>
> TIA and kind regards,
> Ingo
| |
| Ingo Eichinger 2005-12-20, 11:23 am |
| I guess SORTKEY() is supported only by ASA versions 8+, and
we are still using 7.0.4.3519. So do we have to create our
own SORTKEY emulation for ASA 7, or is there another
opportunity to get the requested behaviour?
KR, Ingo
> It sounds like your customers are looking for what is
> called multi-level sorts but ASA collations do not handle
> that. To get that behaviour, you must use the SORTKEY()
> function -- consult your documentation for how to use it.
>
> -john.
| |
| John Smirnios 2005-12-20, 1:23 pm |
| You're right -- it's not in 7.0.4 so there is no convenient way to do
it. You either need to upgrade or to resort to generating sortkeys
yourself (in an external stored procedure or perhaps in Java if Java
supports sortkey generation).
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
Ingo Eichinger wrote:[color=darkred
]
> I guess SORTKEY() is supported only by ASA versions 8+, and
> we are still using 7.0.4.3519. So do we have to create our
> own SORTKEY emulation for ASA 7, or is there another
> opportunity to get the requested behaviour?
>
> KR, Ingo
>
>
>
>
| |
| Ingo Eichinger 2005-12-22, 7:23 am |
| I am not familiar with the japanese character set, but our
support in Japan told us that lower case letters require 1
Byte, upper case letters require 2 Bytes of space.
So would it be possible to create a custom collation file
and specify a 1 Byte-letter (lowercase) and a 2-Byte letter
as its uppercase equivalent, all in the same row of the
collation file?
Or is this the so called "multi-level sort", which you said
is not supported by ASA?
Thank you all!
Ingo
| |
| John Smirnios 2005-12-22, 9:23 am |
| SORTKEY doesn't care how many bytes it takes to encode a character -- it
just does the right thing. ASA's simple collation support (ie if you
don't use SORTKEY) cannot recognize that two characters of different
length are different cases of the same character.
So, SORTKEY gets around the limitations of ASA's simple collation
support but it also gives you mulit-level sorts. A multi-level sort is
not needed to distinguish between characters of diffent lengths: it
gives you nice dictionary behaviour. Consider a single-level sort where
a < A < b < B etc and sort the following strings: ab, Ac, ad, Ae. With
the single level sort, all 'a's will come before all 'A's so you will
get the following order:
ab, ad, Ac, Ae
That's not what you really want. You want all 'a's and 'A's together and
figure out case differences later. You want the following sort order:
ab, Ac, ad, Ae
To get it, you use a multilevel sort (and here I use L to indicate lower
case and U to indicate upper case and I use '/' to separate the levels).
Effectively, you sort the following "sortkeys" to get the desired order:
ab/LL, ac/UL, ad/LL, ae/UL
-john.
--
John Smirnios
Senior Software Developer
iAnywhere Solutions Engineering
Whitepapers, TechDocs, bug fixes are all available through the iAnywhere
Developer Community at http://www.ianywhere.com/developer
Ingo Eichinger wrote:
> I am not familiar with the japanese character set, but our
> support in Japan told us that lower case letters require 1
> Byte, upper case letters require 2 Bytes of space.
>
> So would it be possible to create a custom collation file
> and specify a 1 Byte-letter (lowercase) and a 2-Byte letter
> as its uppercase equivalent, all in the same row of the
> collation file?
> Or is this the so called "multi-level sort", which you said
> is not supported by ASA?
>
> Thank you all!
> Ingo
|
|
|
|
|