Home > Archive > PostgreSQL JDBC > April 2005 > Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution
Tom Lane

2005-04-27, 9:23 am

Guillaume Cottenceau <gc@mnc.ch> writes:
> My reasoning was that if the first byte of this two-byte
> sequence is 0x00 then the rule that 0xEF is first byte of a
> three-byte sequence doesn't apply, since 0xEF is second byte in
> the sequence.


Looking at the source code, it's clear that it's reporting just the
first byte of the sequence; the 00 is redundant and probably shouldn't
be in the message.

There seem to be two possibilities: either there is a valid 3-byte
UTF8 character, which cannot be converted to LATIN1; or the alleged
UTF8 data isn't really UTF8 at all.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere
" to majordomo@postgresql
.org)

Anders Hermansen

2005-04-28, 3:24 am

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Looking at the source code, it's clear that it's reporting just the
> first byte of the sequence; the 00 is redundant and probably shouldn't
> be in the message.


Yes the error message can be a bit confusing. I investigated a error I
got when using psql. I did a select and got the message:
"ERROR: could not convert UTF-8 character 0x00e2 to ISO8859-1"

When looking at the database dump the byte sequence is 0xE2 0x80 0x93, which
is valid UTF-8 (U+2013 EN DASH), but can not be converted because the
character is not found in ISO-8859-1.

If I start up a UTF-8 xterm and psql with UNICODE encoding, then everything
works as expected.

> There seem to be two possibilities: either there is a valid 3-byte
> UTF8 character, which cannot be converted to LATIN1; or the alleged
> UTF8 data isn't really UTF8 at all.


Yes. Maybe the error messages can be changed so that what actually went
wrong is more clear? And possibly printing the whole 3-byte sequence?


Anders Hermansen

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Tom Lane

2005-04-28, 9:23 am

Anders Hermansen <anders@yoyo.no> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
[color=darkred]
> Yes. Maybe the error messages can be changed so that what actually went
> wrong is more clear? And possibly printing the whole 3-byte sequence?


Any volunteers for that? The specific message in question is in
src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c

else if ((c & 0xe0) == 0xe0)
elog(ERROR, "could not convert UTF8 character 0x%04x to ISO8859-1",
c);

Aside from being unhelpful as to the exact input data, this is wrong in
another way: it ought to be an ereport() not elog(), because it's
certainly not a can't-happen kind of error.

A little bit of grepping turns up a number of similarly deficient
elog and ereport calls in the src/backend/utils/mb/ tree.

There is more useful code for constructing a character description in
pg_verifymbstr() in src/backend/utils/mb/wchar.c. Probably what ought
to happen is to split out a small subroutine along the lines of
char *describe_mb_char(co
nst unsigned char *mbstr, int len)
(returning a palloc'd string "0x....") and then make all the places
that complain about bad multibyte input use it.

Don't have time to deal with it myself, but it seems like a pretty easy
project for anyone wanting to dip their toes in the backend.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql
.org so that your
message can get through to the mailing list cleanly

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com