Home > Archive > Microsoft SQL Server forum > July 2005 > SQL 2000 and UTF-16 encoding









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author SQL 2000 and UTF-16 encoding
dBlue

2005-06-28, 8:24 pm

Hi all,

I have an issue on querying against UTF-16 encoded characters in
SQL2000 database: For example the "L=F3pez" is saved into database as
"L=C3=B3pez" (due to the UTF-16 encoding); somehow, when I query data
with conditions of "like 'lop%'" or "like 'L=F3p%'", the row of L=F3pez
would not return.

NOTE: the accent insensitive collation can not help in this case.

Thank you,

Albion(052X)

Erland Sommarskog

2005-06-29, 7:23 am

[posted and mailed, please reply in news]

dBlue (zkvneml@hotmail.com) writes:
> I have an issue on querying against UTF-16 encoded characters in
> SQL2000 database: For example the "López" is saved into database as
> "López" (due to the UTF-16 encoding); somehow, when I query data
> with conditions of "like 'lop%'" or "like 'Lóp%'", the row of López
> would not return.
>
> NOTE: the accent insensitive collation can not help in this case.


I answered a very similar question yesterday, and while the username
and e-mail address, I would assume that you are the same person. I cannot
do much more but repeat my answer from yesterday:

I was not able to repeat this. If you can produce a script similar
to the one below that demonstrates the problem, it's a little easier
to get an idea of what is going on. Don't forget to include the collation
of the column.

CREATE TABLE tbl
(lastname nvarchar(30) COLLATE Finnish_Swedish_BIN NOT NULL)
go
INSERT tbl (lastname) VALUES ('Mopez')
INSERT tbl (lastname) VALUES ('Mópez')
INSERT tbl (lastname) VALUES ('lopez de la serra')
INSERT tbl (lastname) VALUES ('x lópez')
go
SELECT * FROM tbl WHERE lastname LIKE '%[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE '%l[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE N'%[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE N'%l[oó]pez%'
go
drop table tbl


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
dBlue

2005-06-29, 7:23 am

Thank you Erland for the reply, unfortunatly, the problem remains.

I understand your example well, and it is exactly the sql query we are
using now. The problem is not at the accent. The query does not
return me the data row, because the character is save as "L=C3=B3pez" in
the database, and the [o=F3] does not help on it.

BTW, the accent insensitive insensitive does not help either.

Thank you,

Albion(052X)

dBlue

2005-06-29, 9:23 am

Hi,

Please try the script below. It is a bit weired to me.



CREATE TABLE tbl
(lastname nvarchar(30) NOT NULL)
go
INSERT tbl (lastname) VALUES ('M=C3=B3peX')
INSERT tbl (lastname) VALUES ('M=F3peY')
INSERT tbl (lastname) VALUES ('MopeZ')
go

SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M%[=C3=B3o]pe%'
GO

Drop table tbl

The frist two select statement does not return same result as they
suppose to. Any idea?

Thank you,

Albion(052X)

dBlue

2005-06-29, 9:23 am

BYTW, I found an very instereting stuff. Please the script below,

The first 1 and 2 select statement should return same result, but they
do not; and only difference between them is 1 statement does not have
the M character inside the like statement.


CREATE TABLE tbl
(lastname nvarchar(30) NOT NULL)
go
INSERT tbl (lastname) VALUES ('M=C3=B3peX')
INSERT tbl (lastname) VALUES ('M=F3peY')
INSERT tbl (lastname) VALUES ('MopeZ')
go
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%[o]pe%'
GO

drop table tbl=20


Thank you,

Albion(052X)

vincent wehren

2005-06-29, 8:24 pm

| "dBlue" <zkvneml@hotmail.com> schrieb im Newsbeitrag
news:1119999569.407793.143810@g49g2000cwa.googlegroups.com...
|Hi all,
|
|I have an issue on querying against UTF-16 encoded characters in
|SQL2000 database: For example the "López" is saved into database as
|"López" (due to the UTF-16 encoding);

This is not a UTF-16 representation but UTF-8.

--

Vincent Wehren




|somehow, when I query data
|with conditions of "like 'lop%'" or "like 'Lóp%'", the row of López
|would not return.

|NOTE: the accent insensitive collation can not help in this case.

|Thank you,

|Albion(052X)


Erland Sommarskog

2005-06-29, 8:24 pm

dBlue (zkvneml@hotmail.com) writes:
> BYTW, I found an very instereting stuff. Please the script below,
>
> The first 1 and 2 select statement should return same result, but they
> do not; and only difference between them is 1 statement does not have
> the M character inside the like statement.
>
>
> CREATE TABLE tbl
> (lastname nvarchar(30) NOT NULL)
> go
> INSERT tbl (lastname) VALUES ('MópeX')
> INSERT tbl (lastname) VALUES ('MópeY')
> INSERT tbl (lastname) VALUES ('MopeZ')
> go
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%M[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%M%[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%[o]pe%'
> GO
>
> drop table tbl


But "MópeX" is not the same as "MópeX". The first string has six
characters, the second has five. Both strings match N'%[óo]pe%' -
that is first any number of characters and then exactly one of
"Ã", "³" or "o" and then "pe" followed by any number of chars.

On the other hand, the string N'%M[óo]pe%', you say that there
should be an M and then exactly one of "Ã", "³" or "o" and then "pe",
which there is in the six-char string.

OK, so that "MòpeZ" is really a representation of "MópeX", to whit
the bit pattern that you have in UTF-8, but stored in a string which
is supposed to hold UTF_16 values, and then it goes downhill from
there.

So I guess you really problem is why you have UTF-8 encodings in the
wrong place. When I saw your first posting, I assumed that the mangled
UTF-8 came from the news posting itself, but it appears now that it
comes from the database.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
dBlue

2005-07-05, 11:23 am

Hi,

Thank you for the reply.

It seems like it is the problem on querying for the 'M=C3=B3peX' from the
SQL, but getting correct 'M=F3peX', rather than getting the 'M=C3=B3peX'.
Is that right?

The following is the function we use to do the decoding for the
URLstring. I can add the WideCharToMultiByte and MultiByteToWideChar
to get correct data. But is there a more efficient way?

str =3D Replace(str, "+", " ")

i =3D 0
Do
i =3D InStr(i + 1, str, "%")
If i =3D 0 Then
Exit Do
End If
var_char =3D Chr(CStr("&H" + Mid(str, i + 1, 2)))
If Err.Number =3D 0 Then
str =3D Mid(str, 1, i - 1) _
& var_char _
& Mid(str, i + 3)
End If
Loop

thanks again.


Erland Sommarskog wrote:
> dBlue (zkvneml@hotmail.com) writes:
>
> But "M=C3=B3peX" is not the same as "M=F3peX". The first string has six
> characters, the second has five. Both strings match N'%[=C3=B3o]pe%' -
> that is first any number of characters and then exactly one of
> "=C3", "=B3" or "o" and then "pe" followed by any number of chars.
>
> On the other hand, the string N'%M[=C3=B3o]pe%', you say that there
> should be an M and then exactly one of "=C3", "=B3" or "o" and then "pe",
> which there is in the six-char string.
>
> OK, so that "M=C3=B2peZ" is really a representation of "M=F3peX", to whit
> the bit pattern that you have in UTF-8, but stored in a string which
> is supposed to hold UTF_16 values, and then it goes downhill from
> there.
>
> So I guess you really problem is why you have UTF-8 encodings in the
> wrong place. When I saw your first posting, I assumed that the mangled
> UTF-8 came from the news posting itself, but it appears now that it
> comes from the database.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
>
> Books Online for SQL Server SP3 at
> http://www.microsoft.com/sql/techin.../2000/books.asp


Erland Sommarskog

2005-07-05, 8:23 pm

dBlue (zkvneml@hotmail.com) writes:
> It seems like it is the problem on querying for the 'MópeX' from the
> SQL, but getting correct 'MópeX', rather than getting the 'MópeX'.
> Is that right?


The problem is that you have garbage in your database, yes.

> The following is the function we use to do the decoding for the
> URLstring. I can add the WideCharToMultiByte and MultiByteToWideChar
> to get correct data. But is there a more efficient way?
>
> str = Replace(str, "+", " ")
>
> i = 0
> Do
> i = InStr(i + 1, str, "%")
> If i = 0 Then
> Exit Do
> End If
> var_char = Chr(CStr("&H" + Mid(str, i + 1, 2)))
> If Err.Number = 0 Then
> str = Mid(str, 1, i - 1) _
> & var_char _
> & Mid(str, i + 3)
> End If
> Loop


I don't know what URL string you have, and I am not an expert on how
to write effecient VB code.

I have however used WideCharToMultiByte and MultiByteToWideChar (from
C++) to convert between UTF-8 and UTF-16, and found them to work well.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2009 droptable.com