Home > Archive > SQL Server JDBC > September 2005 > How to write UTF-8 encoded XML data to XML column?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author How to write UTF-8 encoded XML data to XML column?
Nigel Charman

2005-07-19, 7:23 am

I am having trouble writing XML data to an XML column using a
PreparedStatement, when the XML data contains the XML declaration
<?xml version="1.0" encoding="UTF-8"?>.

As an example, the following java code succeeds:

Statement stmt = con.createStatement();
stmt.execute("Insert into xmltab VALUES('<?xml version=\"1.0\"
encoding=\"UTF-8\"?><root><a/></root>')");

while this version fails:

PreparedStatement ps = con.prepareStatement("Insert into xmltab
VALUES(?)");
ps.setString(1, "<?xml version=\"1.0\"
encoding=\"UTF-8\"?><root><a/></root>");
ps.execute();

with the exception:
com.microsoft.sqlserver.jdbc.SQLServerException: XML parsing: line 1,
character 38, unable to switch the encoding


The PreparedStatement succeeds if either:
i) the encoding in the XML declaration is set to "UTF-16", or
ii) the encoding attribute in the XML declaration is omitted (in my
understanding, the XML document encoding should then default to UTF-8.
It seems strange that I am seeing different results between having no
encoding attribute and having the encoding set to UTF-8).


In the bigger picture, I am trying to read the XML data from file, or
other source, and would like to be able to pass the XML data as a stream
to the PreparedStatement. I am getting the same "unable to switch the
encoding" exception when using PreparedStatement. setCharacterStream()

method, and am getting the exception "Operand type clash: image is
incompatible with xml" when using PreparedStatement.setBinaryStream(
).

Can anyone shed any light on how I can write UTF-8 encoded XML data to
SQL Server when using a PreparedStatement?


I am using the SQL Server Express June CTP with default setup, and the
JDBC driver with the URL
"jdbc:sqlserver://serverName/ trustedAuthenticatio
n=true".

Regards,
Nigel Charman
Nigel Charman

2005-07-22, 7:23 am

In the absence of any response, how can I file this as a potential fault
with the JDBC driver?

It appears that there may be 2 faults:

1) The exception "XML parsing: line 1, character 38, unable to switch
the encoding" when writing UTF-8 data to the database using prepared
statements.

2) XML documents with no BOM and no explicit encoding is being treated
differently than documents with an explicit encoding of UTF-8. (This is
inferred since data with an explicit encoding fails and data with no
encoding succeeds).
Michael Rhys covers a similar issue in his posting "Heads up on upcoming
change to passing UTF-16 encoded XML via SNCLI and ADO.Net" on
microsoft.private.sqlserver2005.xml.

Regards
Nigel

Nigel Charman wrote:
> I am having trouble writing XML data to an XML column using a
> PreparedStatement, when the XML data contains the XML declaration
> <?xml version="1.0" encoding="UTF-8"?>.
>

Oliver Geisser

2005-07-25, 9:24 am

Hi.

Nigel Charman wrote:
> I am having trouble writing XML data to an XML column using a
> PreparedStatement, when the XML data contains the XML declaration
> <?xml version="1.0" encoding="UTF-8"?>.
>
> As an example, the following java code succeeds:
>
> Statement stmt = con.createStatement();
> stmt.execute("Insert into xmltab VALUES('<?xml version=\"1.0\"
> encoding=\"UTF-8\"?><root><a/></root>')");
>
> while this version fails:
>
> PreparedStatement ps = con.prepareStatement("Insert into xmltab
> VALUES(?)");
> ps.setString(1, "<?xml version=\"1.0\"
> encoding=\"UTF-8\"?><root><a/></root>");
> ps.execute();
>
> with the exception:
> com.microsoft.sqlserver.jdbc.SQLServerException: XML parsing: line 1,
> character 38, unable to switch the encoding

[...]

I can't give you any help with your specific JDBC driver problem.
But maybe I can give you a hint regarding XML encoding.

IMHO you are mixing two layers - character encoding and strings.

If you have bytes (byte[] or InputStream) you will need to decode
the bytes to get text, e.g. a String.

If you have a String then there is nothing like an "encoding".
OK, under the hoods the JVM encodes every String in memory as UTF-16
but this is an implementation detail.

So something like:

String foo = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";

is a contradiction. You allready have String (= sequence of unicode
characters) and do not need an encoding any more.

The XML Specification defines the encoding as a help for XML parsers
which parses _binary_ data. There a a lot of XML parser APIs which allow
to parse Strings, but in most cases the behaviour is undefined or
simply broken.

So to be sure I would allways recommend to not parse XML Strings.

Back to the JDBC problem. It seems that the driver is parsing the string
value and it stumbles over the exact problem - a mismatch between the
XML declaration and the String (or the internal String encoding).

Have you tried to use a byte[] with the text UTF-8 encoded instead of
the String?

Ciao, Olli
Nigel Charman

2005-09-05, 8:23 pm

Oliver Geisser wrote:
> Hi.


Thanks for your response. Sorry I've been off the air for a while.

> Have you tried to use a byte[] with the text UTF-8 encoded instead of
> the String?


This would be my preferred option, but the driver fails with the
exception "Operand type clash: image is incompatible with xml" when
using PreparedStatement.setBinaryStream().

Cheers
Nigel
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2009 droptable.com