|
Home > Archive > MS SQL XML > October 2005 > Preservation of namespace prefixes in XML datatype
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Preservation of namespace prefixes in XML datatype
|
|
|
| I read in the MSDN Library article "XML Options in Microsoft SQL Server 2005"
that namespace prefixes are NOT preserved when storing XML in an XML datatype
column. That seems like a pretty severe "limitation" to me... Why would I, or
how would I, use the XML datatype for storing XML if it will cause loss of
something as important as namespace prefixes?!?
| |
| Kent Tegels 2005-10-27, 9:25 am |
| Hello Carl,
> I read in the MSDN Library article "XML Options in Microsoft SQL
> Server 2005" that namespace prefixes are NOT preserved when storing
> XML in an XML datatype column. That seems like a pretty severe
> "limitation" to me... Why would I, or how would I, use the XML
> datatype for storing XML if it will cause loss of something as
> important as namespace prefixes?!?
That's not completely correct, there's an example of what it does. Try it
for yourself and see.
use scratch
go
drop table dbo.#x
go
create table dbo.#x(example xml)
go
insert into #x values ('<?xml version="1.0" encoding="UTF-8"?>
<vehicle xmlns:dot="http://dot.state.ne.us" xmlns:man="http://www.honda.com">
<dot:type>SUV</dot:type>
<man:model>CRV</man:model>
</vehicle>')
select example from dbo.#x
go
select example.query('/vehicle')
from dbo.#x
go
select example.query('
declare namespace m = "http://www.honda.com";
/vehicle/m:model')
from dbo.#x
go
Thanks!
Kent Tegels
DevelopMentor
Blogging @ http://staff.develop.com/ktegels/
| |
| Michael Rys [MSFT] 2005-10-27, 9:25 am |
| We do not make guarantees, but try to preserve them in many cases. Note,
that according to the XML specifications, the prefixes are not
semantics-bearing. So why is it important? You cannot/should not depend on a
prefix to be of a certain name. That's what you have the URIs for!
Best regards
Michael
"Carl" <Carl@discussions.microsoft.com> wrote in message
news:8E3F2141-1423-44F8-AF74- E33A4089DE0A@microso
ft.com...
>I read in the MSDN Library article "XML Options in Microsoft SQL Server
>2005"
> that namespace prefixes are NOT preserved when storing XML in an XML
> datatype
> column. That seems like a pretty severe "limitation" to me... Why would I,
> or
> how would I, use the XML datatype for storing XML if it will cause loss of
> something as important as namespace prefixes?!?
| |
|
| Well then perhaps this is a philosophy or technology debate because I
disagree!!!
To me, it is irrelevant whether a name prefix is "semantics-bearing" or not,
any more than the meaningful content of an element says "this that or the
other". What is bothersome to me is the thought that (with the exception of
insignificant white space) the XML doc that I put into an XML datatype may
not be the XML doc that I get out of the XML datatype. My position is as
follows:
1) element contents should not change
2) element tag names should not change
3) element attribute names should not change
4) element attribute values should not change
5) namespace prefixes should not change
unless of course I as the programmer/developer execute some process that
specifically requests a change. Again, my argument is based on my position
that my doc coming out of the XML datatype should be the same as my doc going
into the XML datatype unless a specific change is requested. To allow any
part of the XML doc stored in an XML datatype to be subject to change because
there are NO guarantees continues to be very worrisome to me. The exceptions
that do make sense to me include insignificant whitespace and order of
attributes within an element tag, but NOT names of identifiers such as
namespace prefixes. Sorry but I will continue to disagree .....
"Michael Rys [MSFT]" wrote:
> We do not make guarantees, but try to preserve them in many cases. Note,
> that according to the XML specifications, the prefixes are not
> semantics-bearing. So why is it important? You cannot/should not depend on a
> prefix to be of a certain name. That's what you have the URIs for!
>
> Best regards
> Michael
>
> "Carl" <Carl@discussions.microsoft.com> wrote in message
> news:8E3F2141-1423-44F8-AF74- E33A4089DE0A@microso
ft.com...
>
>
>
| |
| Michael Rys [MSFT] 2005-10-27, 9:25 am |
| Hi Carl,
If you really need complete fidelity at the byte level, you will need to use
varbinary(max).
I can understand why you disagree, and that's why we try to preserve the
namespace prefixes. But the W3C XML specifications make it pretty clear that
you cannot depend on it in all cases. The name of an element or attribute
contains the URI and local-name. The prefix is not guaranteed.
However, as I said, we are trying to preserve the namespace prefix:
select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml)
select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml).query('/')
select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml).query('declare
namespace x="aaa"; /x:test')
select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml).query('declare
namespace x="aaa"; <a>{/x:test}</a>')
all preserve it. But sometimes, when we have conflicts or we need to
generate a prefix on the fly, we do not want to pay the price of preserving
an existing prefix...
Best regards
Michael
"Carl" <Carl@discussions.microsoft.com> wrote in message
news:3B513FAE-3379-4E3E-AD50- 42476C76E17A@microso
ft.com...[color=darkred]
> Well then perhaps this is a philosophy or technology debate because I
> disagree!!!
>
> To me, it is irrelevant whether a name prefix is "semantics-bearing" or
> not,
> any more than the meaningful content of an element says "this that or the
> other". What is bothersome to me is the thought that (with the exception
> of
> insignificant white space) the XML doc that I put into an XML datatype may
> not be the XML doc that I get out of the XML datatype. My position is as
> follows:
>
> 1) element contents should not change
>
> 2) element tag names should not change
>
> 3) element attribute names should not change
>
> 4) element attribute values should not change
>
> 5) namespace prefixes should not change
>
> unless of course I as the programmer/developer execute some process that
> specifically requests a change. Again, my argument is based on my position
> that my doc coming out of the XML datatype should be the same as my doc
> going
> into the XML datatype unless a specific change is requested. To allow any
> part of the XML doc stored in an XML datatype to be subject to change
> because
> there are NO guarantees continues to be very worrisome to me. The
> exceptions
> that do make sense to me include insignificant whitespace and order of
> attributes within an element tag, but NOT names of identifiers such as
> namespace prefixes. Sorry but I will continue to disagree .....
>
> "Michael Rys [MSFT]" wrote:
>
| |
|
| Michael,
Thanks for continuing the discussion. This issue clearly is a debate about
underlying philosophy of how to build systems. So let me be as simple as I
can to emphasize the differences.
* I am aware that the W3C XML spec does not guarantee this or that about
namespace prefixes. But guess what? The W3C XML spec says absolutely nothing
about databases of any kind or how XML docs should or should not be stored in
databases.
* So here's what I have to say about databases: If I put something into a
database record field, I should be able to get the exact same thing back out
of the database record field (and this has absolutely nothing to do with
XML). Now if for some strange bizarre reason, the database system is forced
to alter the contents of the database record field, then at the very least
the system should notify the user or developer that a NON-requested change
(ie, a change not requested by user or developer) was performed by the
system, and giving the user/developer the opportunity to accept/decline the
change, rollback the change, or otherwise intervene in some way.
* So here's my question about SQL Server 2005: If/when the database system
makes changes in namespace prefixes, will I the user/developer have the
opportunity to rollback or otherwise prevent those changes from happening?
Alternatively, the simplest design mechanism to prevent this problem from
occuring in the first place, is for SQL Server 2005 to NOT allow storage of
any XML doc for which there might be a change (or else to be able to set this
feature as a developer configuration option).
* Essentially, for any database to be changing something without alerting
the user/developer of the change (at least for me) fundamentally changes the
meaning of what a database is or should be.
I hope that now explains where I'm coming from and what my expectations are.
So does this mean I will have to use varbinary as you suggested?!? Obviously
I would prefer to benefit from the advantages of the XML datatype but not if
there are going to be unpredictable and unknowable changes made without
alerts or notices.
Carl
"Michael Rys [MSFT]" wrote:
> Hi Carl,
>
> If you really need complete fidelity at the byte level, you will need to use
> varbinary(max).
>
> I can understand why you disagree, and that's why we try to preserve the
> namespace prefixes. But the W3C XML specifications make it pretty clear that
> you cannot depend on it in all cases. The name of an element or attribute
> contains the URI and local-name. The prefix is not guaranteed.
>
> However, as I said, we are trying to preserve the namespace prefix:
>
> select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml)
>
> select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml).query('/')
>
> select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml).query('declare
> namespace x="aaa"; /x:test')
>
> select cast(N'<foobar:test xmlns:foobar="aaa"/>' as xml).query('declare
> namespace x="aaa"; <a>{/x:test}</a>')
>
> all preserve it. But sometimes, when we have conflicts or we need to
> generate a prefix on the fly, we do not want to pay the price of preserving
> an existing prefix...
>
> Best regards
> Michael
>
> "Carl" <Carl@discussions.microsoft.com> wrote in message
> news:3B513FAE-3379-4E3E-AD50- 42476C76E17A@microso
ft.com...
>
>
>
| |
| Michael Rys [MSFT] 2005-10-27, 9:25 am |
| Carl, first as I pointed out, if you store your XML and then retrieve it, we
preserve the prefix. However, for a variety of reasons, we do not want to
guarantee that (although I don't see this changing anytime soon).
So this is somewhat of a theoretical discussion for SQL Server 2005.
Now the ANSI/ISO SQL-2003 standard makes it pretty clear that the XML
document is based on the XML information set abstraction and not a textual
abstraction. This means that while you load a textual form of a datatype
value, all the guarantee the database gives you is that we preserve the
logical model of the value and not the lexical representation. If you need
to preserve that, you need to use varbinary() (even the string datatypes may
not be good enough, given their code-page dependency).
This is the same for many other datatypes as well. For example, if you store
a decimal and provide the value 00012.12, you will only get back 12.12. Do
you say that the decimal type should preserve the leading zeros and raise
warnings or provide some other mechanism?
Best regards
Michael
"Carl" <Carl@discussions.microsoft.com> wrote in message
news:33EBF020-ECFA-4A7E-9CE7- 531220EF45E2@microso
ft.com...[color=darkred]
> Michael,
>
> Thanks for continuing the discussion. This issue clearly is a debate about
> underlying philosophy of how to build systems. So let me be as simple as I
> can to emphasize the differences.
>
> * I am aware that the W3C XML spec does not guarantee this or that about
> namespace prefixes. But guess what? The W3C XML spec says absolutely
> nothing
> about databases of any kind or how XML docs should or should not be stored
> in
> databases.
>
> * So here's what I have to say about databases: If I put something into a
> database record field, I should be able to get the exact same thing back
> out
> of the database record field (and this has absolutely nothing to do with
> XML). Now if for some strange bizarre reason, the database system is
> forced
> to alter the contents of the database record field, then at the very least
> the system should notify the user or developer that a NON-requested change
> (ie, a change not requested by user or developer) was performed by the
> system, and giving the user/developer the opportunity to accept/decline
> the
> change, rollback the change, or otherwise intervene in some way.
>
> * So here's my question about SQL Server 2005: If/when the database system
> makes changes in namespace prefixes, will I the user/developer have the
> opportunity to rollback or otherwise prevent those changes from happening?
> Alternatively, the simplest design mechanism to prevent this problem from
> occuring in the first place, is for SQL Server 2005 to NOT allow storage
> of
> any XML doc for which there might be a change (or else to be able to set
> this
> feature as a developer configuration option).
>
> * Essentially, for any database to be changing something without alerting
> the user/developer of the change (at least for me) fundamentally changes
> the
> meaning of what a database is or should be.
>
> I hope that now explains where I'm coming from and what my expectations
> are.
> So does this mean I will have to use varbinary as you suggested?!?
> Obviously
> I would prefer to benefit from the advantages of the XML datatype but not
> if
> there are going to be unpredictable and unknowable changes made without
> alerts or notices.
>
> Carl
>
>
>
> "Michael Rys [MSFT]" wrote:
>
| |
|
| It's a "brave new world" isn't it? Here are some responses:
* The numerical example with "if you store a decimal and provide the value
00012.12, you will only get back 12.12. Do you say that the decimal type
should preserve the leading zeros and raise warnings or provide some other
mechanism?" is NOT an analogous or fair example for comparison. I doubt there
are any computational scientists who would seek what I will call more than
one use out of decimal numbers. They would want to preserve as much as
numerical accuracy as possible in any database in which the numbers were
stored. In contrast, namespace identifiers have more than one simple
"logical" use for the programmer/developer. And a developer may have very
good reasons for wanting to preserve a particular choice of a lexical
representation well beyond merely preserving the logical meaning. A better
analogy than decimal numbers would be variable names in a program. Sure it is
possible to substitute any arbitrary lexical representation for a
programmer's choice of a variable name, but then it would lose a lot of value
and meaning to the developer that chose that particular variable name within
the context of his other programs and libraries.
* So yes I do believe that it is important to provide an alert whenever an
identifier is changed because that identifier has more than one use with
meaningful lexical representation in addition to logical meaning. A better
analogy (for me) would be the following: I do not want Visual Studio or
Source Safe or any otherr major tool to change my choice of variable names or
identifiers in my source code just because one could make the argument that
it's OK becausee the logical meaning is preserved!!!!
* And if I disagree with the OFFICIAL committees out there who are building
the specs and what not, then I guess I disagree!
CT
"Michael Rys [MSFT]" wrote:
> Carl, first as I pointed out, if you store your XML and then retrieve it, we
> preserve the prefix. However, for a variety of reasons, we do not want to
> guarantee that (although I don't see this changing anytime soon).
>
> So this is somewhat of a theoretical discussion for SQL Server 2005.
>
> Now the ANSI/ISO SQL-2003 standard makes it pretty clear that the XML
> document is based on the XML information set abstraction and not a textual
> abstraction. This means that while you load a textual form of a datatype
> value, all the guarantee the database gives you is that we preserve the
> logical model of the value and not the lexical representation. If you need
> to preserve that, you need to use varbinary() (even the string datatypes may
> not be good enough, given their code-page dependency).
>
> This is the same for many other datatypes as well. For example, if you store
> a decimal and provide the value 00012.12, you will only get back 12.12. Do
> you say that the decimal type should preserve the leading zeros and raise
> warnings or provide some other mechanism?
>
> Best regards
> Michael
>
> "Carl" <Carl@discussions.microsoft.com> wrote in message
> news:33EBF020-ECFA-4A7E-9CE7- 531220EF45E2@microso
ft.com...
>
>
>
| |
| Kent Tegels 2005-10-27, 9:26 am |
| Hello Carl,
> A better analogy
> than decimal numbers would be variable names in a program. Sure it is
> possible to substitute any arbitrary lexical representation for a
> programmer's choice of a variable name, but then it would lose a lot
> of value and meaning to the developer that chose that particular
> variable name within the context of his other programs and libraries.
I'm having a hard time following the analogy, actually. We're talking about
data, right? If I follow your argument out, whenever I commit data to a store,
not only should I preserve that data, but I should also preserve the variable
name you called in your code so that anybody that looks at the data can say
"oh, well that Name part of the data tells us... well... that the string
represents a name. Of what, we don't know what its the name of, but we know
its a name."
The namespacing issue with XML is parallel: thank goodness that I don't have
to preserve aliases when transforming XML from one source to another. I have
clients that have multiple namespaces referenced thoughout different instances
of XML that use the same aliases. Both "Clients" and "Configurations" are
aliasesed to "c" in different documents that sometimes I need to merge. How
preservable is an alias then? Its the namespaces -- not the aliases -- that
provide semantic reference. As long as those aren't lost, the aliases themselves
can get just as much in the way as they help.
Think about what happens to source code compiled down to machine code. There's
no variables at all at the machine level: just instructions, memory locations
and CPU registers. True that the machine code "loses lots of value" to the
developer, but then, why isn't this an issue at that level? If you're answer
is "well, because nobody maintains the machine code" then I'd agrue that
nobody really maintains the stored XML either since what SQL Server stores
isn't the XML at all, its a binary form of it -- just like compiled code.
They are both representations of data for a processor. Nothing more and nothing
less. If your then becomes, "well, SQL Server XML is a crappy 'source code'
vault," compare it to the output of any other decompiler that doesn't have
the debug information. By that comparsion, the SQL XML type has amazing fidelity.
:)
Thanks!
Kent Tegels
DevelopMentor
Blogging @ http://staff.develop.com/ktegels/
| |
|
| Way cool discussion! Thanks for your contribution here.
I'll try to respond when I have some more time, and see if I can come up
with a better way to explain my position and ideas within context you have
raised.
Carl
"Kent Tegels" wrote:
> Hello Carl,
>
>
> I'm having a hard time following the analogy, actually. We're talking about
> data, right? If I follow your argument out, whenever I commit data to a store,
> not only should I preserve that data, but I should also preserve the variable
> name you called in your code so that anybody that looks at the data can say
> "oh, well that Name part of the data tells us... well... that the string
> represents a name. Of what, we don't know what its the name of, but we know
> its a name."
>
> The namespacing issue with XML is parallel: thank goodness that I don't have
> to preserve aliases when transforming XML from one source to another. I have
> clients that have multiple namespaces referenced thoughout different instances
> of XML that use the same aliases. Both "Clients" and "Configurations" are
> aliasesed to "c" in different documents that sometimes I need to merge. How
> preservable is an alias then? Its the namespaces -- not the aliases -- that
> provide semantic reference. As long as those aren't lost, the aliases themselves
> can get just as much in the way as they help.
>
> Think about what happens to source code compiled down to machine code. There's
> no variables at all at the machine level: just instructions, memory locations
> and CPU registers. True that the machine code "loses lots of value" to the
> developer, but then, why isn't this an issue at that level? If you're answer
> is "well, because nobody maintains the machine code" then I'd agrue that
> nobody really maintains the stored XML either since what SQL Server stores
> isn't the XML at all, its a binary form of it -- just like compiled code.
> They are both representations of data for a processor. Nothing more and nothing
> less. If your then becomes, "well, SQL Server XML is a crappy 'source code'
> vault," compare it to the output of any other decompiler that doesn't have
> the debug information. By that comparsion, the SQL XML type has amazing fidelity.
> :)
>
> Thanks!
>
> Kent Tegels
> DevelopMentor
> Blogging @ http://staff.develop.com/ktegels/
>
>
>
|
|
|
|
|