|
Home > Archive > Microsoft SQL Server forum > July 2005 > SQL 2000 and UTF-16 encoding
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
SQL 2000 and UTF-16 encoding
|
|
|
| Hi all,
I have an issue on querying against UTF-16 encoded characters in
SQL2000 database: For example the "L=F3pez" is saved into database as
"L=C3=B3pez" (due to the UTF-16 encoding); somehow, when I query data
with conditions of "like 'lop%'" or "like 'L=F3p%'", the row of L=F3pez
would not return.
NOTE: the accent insensitive collation can not help in this case.
Thank you,
Albion(052X)
| |
| Erland Sommarskog 2005-06-29, 7:23 am |
| [posted and mailed, please reply in news]
dBlue (zkvneml@hotmail.com) writes:
> I have an issue on querying against UTF-16 encoded characters in
> SQL2000 database: For example the "López" is saved into database as
> "López" (due to the UTF-16 encoding); somehow, when I query data
> with conditions of "like 'lop%'" or "like 'Lóp%'", the row of López
> would not return.
>
> NOTE: the accent insensitive collation can not help in this case.
I answered a very similar question yesterday, and while the username
and e-mail address, I would assume that you are the same person. I cannot
do much more but repeat my answer from yesterday:
I was not able to repeat this. If you can produce a script similar
to the one below that demonstrates the problem, it's a little easier
to get an idea of what is going on. Don't forget to include the collation
of the column.
CREATE TABLE tbl
(lastname nvarchar(30) COLLATE Finnish_Swedish_BIN NOT NULL)
go
INSERT tbl (lastname) VALUES ('Mopez')
INSERT tbl (lastname) VALUES ('Mópez')
INSERT tbl (lastname) VALUES ('lopez de la serra')
INSERT tbl (lastname) VALUES ('x lópez')
go
SELECT * FROM tbl WHERE lastname LIKE '%[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE '%l[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE N'%[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE N'%l[oó]pez%'
go
drop table tbl
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
| |
|
| Thank you Erland for the reply, unfortunatly, the problem remains.
I understand your example well, and it is exactly the sql query we are
using now. The problem is not at the accent. The query does not
return me the data row, because the character is save as "L=C3=B3pez" in
the database, and the [o=F3] does not help on it.
BTW, the accent insensitive insensitive does not help either.
Thank you,
Albion(052X)
| |
|
| Hi,
Please try the script below. It is a bit weired to me.
CREATE TABLE tbl
(lastname nvarchar(30) NOT NULL)
go
INSERT tbl (lastname) VALUES ('M=C3=B3peX')
INSERT tbl (lastname) VALUES ('M=F3peY')
INSERT tbl (lastname) VALUES ('MopeZ')
go
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M%[=C3=B3o]pe%'
GO
Drop table tbl
The frist two select statement does not return same result as they
suppose to. Any idea?
Thank you,
Albion(052X)
| |
|
| BYTW, I found an very instereting stuff. Please the script below,
The first 1 and 2 select statement should return same result, but they
do not; and only difference between them is 1 statement does not have
the M character inside the like statement.
CREATE TABLE tbl
(lastname nvarchar(30) NOT NULL)
go
INSERT tbl (lastname) VALUES ('M=C3=B3peX')
INSERT tbl (lastname) VALUES ('M=F3peY')
INSERT tbl (lastname) VALUES ('MopeZ')
go
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%M%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_C
P1_CI_AI LIKE N'%[o]pe%'
GO
drop table tbl=20
Thank you,
Albion(052X)
| |
| vincent wehren 2005-06-29, 8:24 pm |
| | "dBlue" <zkvneml@hotmail.com> schrieb im Newsbeitrag
news:1119999569.407793.143810@g49g2000cwa.googlegroups.com...
|Hi all,
|
|I have an issue on querying against UTF-16 encoded characters in
|SQL2000 database: For example the "López" is saved into database as
|"López" (due to the UTF-16 encoding);
This is not a UTF-16 representation but UTF-8.
--
Vincent Wehren
|somehow, when I query data
|with conditions of "like 'lop%'" or "like 'Lóp%'", the row of López
|would not return.
|NOTE: the accent insensitive collation can not help in this case.
|Thank you,
|Albion(052X)
| |
| Erland Sommarskog 2005-06-29, 8:24 pm |
| dBlue (zkvneml@hotmail.com) writes:
> BYTW, I found an very instereting stuff. Please the script below,
>
> The first 1 and 2 select statement should return same result, but they
> do not; and only difference between them is 1 statement does not have
> the M character inside the like statement.
>
>
> CREATE TABLE tbl
> (lastname nvarchar(30) NOT NULL)
> go
> INSERT tbl (lastname) VALUES ('MópeX')
> INSERT tbl (lastname) VALUES ('MópeY')
> INSERT tbl (lastname) VALUES ('MopeZ')
> go
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%M[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%M%[óo]pe%'
> SELECT * FROM tbl WHERE lastname Collate
> SQL_Latin1_General_C
P1_CI_AI LIKE N'%[o]pe%'
> GO
>
> drop table tbl
But "MópeX" is not the same as "MópeX". The first string has six
characters, the second has five. Both strings match N'%[óo]pe%' -
that is first any number of characters and then exactly one of
"Ã", "³" or "o" and then "pe" followed by any number of chars.
On the other hand, the string N'%M[óo]pe%', you say that there
should be an M and then exactly one of "Ã", "³" or "o" and then "pe",
which there is in the six-char string.
OK, so that "MòpeZ" is really a representation of "MópeX", to whit
the bit pattern that you have in UTF-8, but stored in a string which
is supposed to hold UTF_16 values, and then it goes downhill from
there.
So I guess you really problem is why you have UTF-8 encodings in the
wrong place. When I saw your first posting, I assumed that the mangled
UTF-8 came from the news posting itself, but it appears now that it
comes from the database.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
| |
| dBlue 2005-07-05, 11:23 am |
| Hi,
Thank you for the reply.
It seems like it is the problem on querying for the 'M=C3=B3peX' from the
SQL, but getting correct 'M=F3peX', rather than getting the 'M=C3=B3peX'.
Is that right?
The following is the function we use to do the decoding for the
URLstring. I can add the WideCharToMultiByte and MultiByteToWideChar
to get correct data. But is there a more efficient way?
str =3D Replace(str, "+", " ")
i =3D 0
Do
i =3D InStr(i + 1, str, "%")
If i =3D 0 Then
Exit Do
End If
var_char =3D Chr(CStr("&H" + Mid(str, i + 1, 2)))
If Err.Number =3D 0 Then
str =3D Mid(str, 1, i - 1) _
& var_char _
& Mid(str, i + 3)
End If
Loop
thanks again.
Erland Sommarskog wrote:
> dBlue (zkvneml@hotmail.com) writes:
>
> But "M=C3=B3peX" is not the same as "M=F3peX". The first string has six
> characters, the second has five. Both strings match N'%[=C3=B3o]pe%' -
> that is first any number of characters and then exactly one of
> "=C3", "=B3" or "o" and then "pe" followed by any number of chars.
>
> On the other hand, the string N'%M[=C3=B3o]pe%', you say that there
> should be an M and then exactly one of "=C3", "=B3" or "o" and then "pe",
> which there is in the six-char string.
>
> OK, so that "M=C3=B2peZ" is really a representation of "M=F3peX", to whit
> the bit pattern that you have in UTF-8, but stored in a string which
> is supposed to hold UTF_16 values, and then it goes downhill from
> there.
>
> So I guess you really problem is why you have UTF-8 encodings in the
> wrong place. When I saw your first posting, I assumed that the mangled
> UTF-8 came from the news posting itself, but it appears now that it
> comes from the database.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
>
> Books Online for SQL Server SP3 at
> http://www.microsoft.com/sql/techin.../2000/books.asp
| |
| Erland Sommarskog 2005-07-05, 8:23 pm |
| dBlue (zkvneml@hotmail.com) writes:
> It seems like it is the problem on querying for the 'MópeX' from the
> SQL, but getting correct 'MópeX', rather than getting the 'MópeX'.
> Is that right?
The problem is that you have garbage in your database, yes.
> The following is the function we use to do the decoding for the
> URLstring. I can add the WideCharToMultiByte and MultiByteToWideChar
> to get correct data. But is there a more efficient way?
>
> str = Replace(str, "+", " ")
>
> i = 0
> Do
> i = InStr(i + 1, str, "%")
> If i = 0 Then
> Exit Do
> End If
> var_char = Chr(CStr("&H" + Mid(str, i + 1, 2)))
> If Err.Number = 0 Then
> str = Mid(str, 1, i - 1) _
> & var_char _
> & Mid(str, i + 3)
> End If
> Loop
I don't know what URL string you have, and I am not an expert on how
to write effecient VB code.
I have however used WideCharToMultiByte and MultiByteToWideChar (from
C++) to convert between UTF-8 and UTF-16, and found them to work well.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
|
|
|
|
|