|
Home > Archive > MS SQL XML > December 2005 > Merge XML data
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| I'm trying to merge two or more database records. For example:
id | doc (type xml)
1 <s><t id="2"><p n="1">Something</p></t></s>
2 <s><t id="3"><p n="1">Something</p></t></s>
3 <s><t id="3"><p n="2">Something</p></t></s>
what I need to get is this:
<s>
<t id="2">
<p n="1">Something</p>
</t>
<t id="3">
<p n="1">Something</p>
<p n="2">Something</p>
</t>
</s>
Could this be done on SQL Server 2005 using xquery!? Any suggestions?!
| |
| Michael Rys [MSFT] 2005-12-22, 3:24 am |
| Nice puzzle.
In SQL Server 2005, you will need to use XQuery and FOR XML. The following
gives you the result assuming that you will only ever have one t element per
row:
create table T (id int, doc xml)
go
insert into T
select 1, N'<s><t id="2"><p n="1">Something</p></t></s>'
union
select 2, N'<s><t id="3"><p n="1">Something</p></t></s>'
union
select 3, N'<s><t id="3"><p n="2">Something</p></t></s>'
go
create function groupingid(@x xml)
returns int
as
begin
return @x.value('(/s/t/@id)[1]','int')
end
go
select
dbo.groupingid(T.doc) as "@id",
(select T2.doc.query('/s/t/*') from T T2
where dbo.groupingid(T2.doc) = dbo.groupingid(T.doc)
for xml path(''),type)
from T
group by dbo.groupingid(T.doc)
for xml path('t'), root('s')
Note that you need the user defined function because of a restriction of not
allowing subqueries (including XQuery) in group by expressions directly.
Now, if we would have top-level XQuery, we could have written this as:
<s>{
for $id in distinct- values(sql:xmlcol('T
.doc')/s/t/@id)
return
<t id="{$id}">{
sql:xmlcol('T.doc')/s/t[@id = $id]/*
}</t>
}</s>
You can model this (and thus get an alternative to the above) with the
following expression:
select
(select doc as "*" from T for xml path(''), type).query('
<s>{
for $id in distinct-values(/s/t/@id)
return
<t id="{$id}">{/s/t[@id = $id]/*}</t>
}</s>
')
I recommend that you time the expressions with your specific data and let us
know which is faster.
Best regards
Michael
"VV" <VV@zg.htnet.hr> wrote in message news:dodips$8sm$1@ss
405.t-com.hr...
> I'm trying to merge two or more database records. For example:
>
> id | doc (type xml)
> 1 <s><t id="2"><p n="1">Something</p></t></s>
> 2 <s><t id="3"><p n="1">Something</p></t></s>
> 3 <s><t id="3"><p n="2">Something</p></t></s>
>
> what I need to get is this:
>
> <s>
> <t id="2">
> <p n="1">Something</p>
> </t>
> <t id="3">
> <p n="1">Something</p>
> <p n="2">Something</p>
> </t>
> </s>
>
> Could this be done on SQL Server 2005 using xquery!? Any suggestions?!
>
>
>
| |
|
| "Michael Rys [MSFT]" <mrys@online.microsoft.com> wrote in message
news:OKe550sBGHA.1676@TK2MSFTNGP09.phx.gbl...
> Nice puzzle.
>
> In SQL Server 2005, you will need to use XQuery and FOR XML. The following
> gives you the result assuming that you will only ever have one t element
> per row:
Thanks, this is a great help! Acctualy I have more than one t elements :(,
and my data is more complex than this example.
I looked up some examples like
http://www.xml.com/pub/a/2003/11/26/q-and-a.html,
http://www.informatik.hu-berlin.de/~obecker/XSLT/#merge and I was wondering
could this be done on SQL 2005 instead manipulating data in .net.
For example:
First Xml row
<s>
<t ="1">
<p n="1" a="1" d="0" >Something1</p>
<p n="2" a="1" d="0" >Something2</p>
</t>
<t ="2">
<p n="1" a="1" d="0" >Something1</p>
<p n="2" a="1" d="0" >Something2</p>
</t>
</s>
Second Xml row
<s>
<t ="1">
<p n="1" a="0" d="1" >Something1</p>
<p n="3" a="0" d="1" >Something3</p>
</t>
<t ="3">
<p n="1" a="1" d="0" >Something1</p>
</t>
</s>
Result should be something like this!
<s>
<t ="1">
<p n="1" a="0" d="1" >Something1</p>
<p n="2" a="1" d="0" >Something2</p>
<p n="3" a="0" d="1" >Something3</p>
</t>
<t ="2">
<p n="1" a="1" d="0" >Something1</p>
<p n="2" a="1" d="0" >Something2</p>
</t>
<t ="3">
<p n="1" a="1" d="0" >Something1</p>
</t>
</s>
Notice that d (deny) overiddes a (allow).
Thanks, Viktor
>
> create table T (id int, doc xml)
>
> go
>
> insert into T
>
> select 1, N'<s><t id="2"><p n="1">Something</p></t></s>'
>
> union
>
> select 2, N'<s><t id="3"><p n="1">Something</p></t></s>'
>
> union
>
> select 3, N'<s><t id="3"><p n="2">Something</p></t></s>'
>
> go
>
> create function groupingid(@x xml)
>
> returns int
>
> as
>
> begin
>
> return @x.value('(/s/t/@id)[1]','int')
>
> end
>
> go
>
> select
> dbo.groupingid(T.doc) as "@id",
> (select T2.doc.query('/s/t/*') from T T2
> where dbo.groupingid(T2.doc) = dbo.groupingid(T.doc)
> for xml path(''),type)
> from T
> group by dbo.groupingid(T.doc)
> for xml path('t'), root('s')
>
> Note that you need the user defined function because of a restriction of
> not allowing subqueries (including XQuery) in group by expressions
> directly.
>
> Now, if we would have top-level XQuery, we could have written this as:
>
> <s>{
> for $id in distinct- values(sql:xmlcol('T
.doc')/s/t/@id)
> return
> <t id="{$id}">{
> sql:xmlcol('T.doc')/s/t[@id = $id]/*
> }</t>
> }</s>
>
> You can model this (and thus get an alternative to the above) with the
> following expression:
>
> select
> (select doc as "*" from T for xml path(''), type).query('
> <s>{
> for $id in distinct-values(/s/t/@id)
> return
> <t id="{$id}">{/s/t[@id = $id]/*}</t>
> }</s>
> ')
>
> I recommend that you time the expressions with your specific data and let
> us know which is faster.
>
> Best regards
>
> Michael
>
> "VV" <VV@zg.htnet.hr> wrote in message news:dodips$8sm$1@ss
405.t-com.hr...
>
>
| |
| Michael Rys [MSFT] 2005-12-23, 3:24 am |
| I like the challenge :-)
You have three options to solve it:
1. You decompose all the data into relational forms using the nodes() method
and do the pruning of the disallowed elements that are now represented as
rowsets in SQL and then use FOR XML PATH to completely recompose the tree.
Ugly to write and maintain. You may also want to move predicates into where
clauses from the XML or vice versa, depending on XML indices...
select
ID.id as "@id",
( select
( select
( select top 1 R3.pnode.query('.')
from T T2
cross apply T2.doc.nodes('/s/t') R2(tnode)
cross apply R2.tnode.nodes('p[@n=sql:column("P.n") and @d="1"]')
R3(pnode)
where 1=R2.tnode.exist('@id[.=sql:column("ID.id")]'
) union all
( select top 1 R3.pnode.query('.')
from T T2
cross apply T2.doc.nodes('/s/t') R2(tnode)
cross apply R2.tnode.nodes('p[@n=sql:column("P.n") and @d="0"]')
R3(pnode)
where NOT EXISTS
( select 1
from T T2
cross apply T2.doc.nodes('/s/t') R2(tnode)
cross apply R2.tnode.nodes('p[@n=sql:column("P.n") and @d="1"]')
R3(pnode)
where 1=R2.tnode.exist('@id[.=sql:column("ID.id")]')
)
)
)
for xml path(''),type
)
from
( select distinct Rx.pnode.value('@n','int') as n
from T
cross apply T.doc.nodes('/s/t[@id=sql:column("ID.id")]/p') Rx(pnode)) P
for xml path(''),type)
from
( select distinct R.tnode.value('@id','int') as id
from T cross apply T.doc.nodes('/s/t') R(tnode)) ID
for xml path('t'), root('s')
2. You write everything in XQuery as in (assuming the same table structure
with the data as you give below). Easier but does a complete table
aggregation beforehand which may or may not be faster. Also, a let would
make this more maintainable...
select (select doc as "*" from T for xml path(''), type).query('
<s>{
for $id in distinct-values(/s/t/@id)
return
<t id="{$id}">{
for $n in distinct-values(/s/t[@id = $id]/p/@n)
(:order by $n (: Add if you need them ordered :):)
return
( (/s/t[@id = $id]/p[@n=$n and @d="1"])[1],
if (not(/s/t[@id = $id]/p[@n=$n and @d="1"])) then
(/s/t[@id = $id]/p[@n=$n and @d="0"])[1]
else ()
)
}</t>
}</s>
')
Note that the inner return returns the first one of the @d being 1 for a
given p with @n and only returns the first one with @d="0" if none has @d
set to 1. This can deal with an arbitrary numbers of p with a given @n in
side a specific t. If you need to distinguish between those that have
different a's for a given @d, you will need to add more logic [see my note
below]. This is the same logic that I used in the relational expression
under 1.
3. The third approach is that you continue to use the nodes() method and FOR
XML to deal with the grouping under the t's, but use the above XQuery for
doing the p-merge logic:
select
ID.id as "@id",
( select R2.tnode.query('p')
from T T2
cross apply T2.doc.nodes('/s/t') R2(tnode)
where 1=R2.tnode.exist('@id[.=sql:column("ID.id")]')
for xml path(''),type
).query('
for $n in distinct-values(/p/@n)
(:order by $n (: Add if you need them ordered :):)
return
( (/p[@n=$n and @d="1"])[1],
if (not(/p[@n=$n and @d="1"])) then
(/p[@n=$n and @d="0"])[1]
else ()
)
')
from
( select distinct R.tnode.value('@id','int') as id
from T
cross apply T.doc.nodes('/s/t') R(tnode)) ID
for xml path('t'), root('s')
Let me add a note about your allowance modelling: You have several
information modelled redundantly:
You express the allow and disallow separately, which can lead to 4 states
(allow/disallow being contradictory). It would be better to express it with
a single property with 3 states (if you want to express: neither allowed or
disallowed, allowed, disallowed) or with a single property with 2 states
(allowed, disallowed). Also in the later case, you could just express it
with the presence or absence of an attribute/subelement without caring about
the value.
Best regards
Michael
"VV" <VV@zg.htnet.hr> wrote in message news:doe62k$p0p$1@ss
405.t-com.hr...
> "Michael Rys [MSFT]" <mrys@online.microsoft.com> wrote in message
> news:OKe550sBGHA.1676@TK2MSFTNGP09.phx.gbl...
>
> Thanks, this is a great help! Acctualy I have more than one t elements :(,
> and my data is more complex than this example.
>
> I looked up some examples like
> http://www.xml.com/pub/a/2003/11/26/q-and-a.html,
> http://www.informatik.hu-berlin.de/~obecker/XSLT/#merge and I was
> wondering could this be done on SQL 2005 instead manipulating data in
> .net.
>
> For example:
>
> First Xml row
>
> <s>
> <t ="1">
> <p n="1" a="1" d="0" >Something1</p>
> <p n="2" a="1" d="0" >Something2</p>
> </t>
> <t ="2">
> <p n="1" a="1" d="0" >Something1</p>
> <p n="2" a="1" d="0" >Something2</p>
> </t>
> </s>
>
> Second Xml row
>
> <s>
> <t ="1">
> <p n="1" a="0" d="1" >Something1</p>
> <p n="3" a="0" d="1" >Something3</p>
> </t>
> <t ="3">
> <p n="1" a="1" d="0" >Something1</p>
> </t>
> </s>
>
> Result should be something like this!
>
> <s>
> <t ="1">
> <p n="1" a="0" d="1" >Something1</p>
> <p n="2" a="1" d="0" >Something2</p>
> <p n="3" a="0" d="1" >Something3</p>
> </t>
> <t ="2">
> <p n="1" a="1" d="0" >Something1</p>
> <p n="2" a="1" d="0" >Something2</p>
> </t>
> <t ="3">
> <p n="1" a="1" d="0" >Something1</p>
> </t>
> </s>
>
> Notice that d (deny) overiddes a (allow).
>
> Thanks, Viktor
>
>
>
| |
|
| Uauuu, thank you so much! This is what I was looking for... I'm
speechless...
Viktor
"Michael Rys [MSFT]" <mrys@online.microsoft.com> wrote in message
news:OIgAnp3BGHA.3580@TK2MSFTNGP11.phx.gbl...
>I like the challenge :-)
>
> You have three options to solve it:
>
> 1. You decompose all the data into relational forms using the nodes()
> method and do the pruning of the disallowed elements that are now
> represented as rowsets in SQL and then use FOR XML PATH to completely
> recompose the tree. Ugly to write and maintain. You may also want to move
> predicates into where clauses from the XML or vice versa, depending on XML
> indices...
> select
> ID.id as "@id",
> ( select
> ( select
> ( select top 1 R3.pnode.query('.')
> from T T2
> cross apply T2.doc.nodes('/s/t') R2(tnode)
> cross apply R2.tnode.nodes('p[@n=sql:column("P.n") and @d="1"]')
> R3(pnode)
> where 1=R2.tnode.exist('@id[.=sql:column("ID.id")]'
> ) union all
> ( select top 1 R3.pnode.query('.')
> from T T2
> cross apply T2.doc.nodes('/s/t') R2(tnode)
> cross apply R2.tnode.nodes('p[@n=sql:column("P.n") and @d="0"]')
> R3(pnode)
> where NOT EXISTS
> ( select 1
> from T T2
> cross apply T2.doc.nodes('/s/t') R2(tnode)
> cross apply R2.tnode.nodes('p[@n=sql:column("P.n") and
> @d="1"]') R3(pnode)
> where 1=R2.tnode.exist('@id[.=sql:column("ID.id")]')
> )
> )
> )
> for xml path(''),type
> )
> from
> ( select distinct Rx.pnode.value('@n','int') as n
> from T
> cross apply T.doc.nodes('/s/t[@id=sql:column("ID.id")]/p') Rx(pnode)) P
> for xml path(''),type)
> from
> ( select distinct R.tnode.value('@id','int') as id
> from T cross apply T.doc.nodes('/s/t') R(tnode)) ID
> for xml path('t'), root('s')
>
> 2. You write everything in XQuery as in (assuming the same table structure
> with the data as you give below). Easier but does a complete table
> aggregation beforehand which may or may not be faster. Also, a let would
> make this more maintainable...
> select (select doc as "*" from T for xml path(''), type).query('
> <s>{
> for $id in distinct-values(/s/t/@id)
> return
> <t id="{$id}">{
> for $n in distinct-values(/s/t[@id = $id]/p/@n)
> (:order by $n (: Add if you need them ordered :):)
> return
> ( (/s/t[@id = $id]/p[@n=$n and @d="1"])[1],
> if (not(/s/t[@id = $id]/p[@n=$n and @d="1"])) then
> (/s/t[@id = $id]/p[@n=$n and @d="0"])[1]
> else ()
> )
> }</t>
> }</s>
> ')
>
> Note that the inner return returns the first one of the @d being 1 for a
> given p with @n and only returns the first one with @d="0" if none has @d
> set to 1. This can deal with an arbitrary numbers of p with a given @n in
> side a specific t. If you need to distinguish between those that have
> different a's for a given @d, you will need to add more logic [see my note
> below]. This is the same logic that I used in the relational expression
> under 1.
>
> 3. The third approach is that you continue to use the nodes() method and
> FOR XML to deal with the grouping under the t's, but use the above XQuery
> for doing the p-merge logic:
>
> select
> ID.id as "@id",
> ( select R2.tnode.query('p')
> from T T2
> cross apply T2.doc.nodes('/s/t') R2(tnode)
> where 1=R2.tnode.exist('@id[.=sql:column("ID.id")]')
> for xml path(''),type
> ).query('
> for $n in distinct-values(/p/@n)
> (:order by $n (: Add if you need them ordered :):)
> return
> ( (/p[@n=$n and @d="1"])[1],
> if (not(/p[@n=$n and @d="1"])) then
> (/p[@n=$n and @d="0"])[1]
> else ()
> )
> ')
> from
> ( select distinct R.tnode.value('@id','int') as id
> from T
> cross apply T.doc.nodes('/s/t') R(tnode)) ID
> for xml path('t'), root('s')
>
> Let me add a note about your allowance modelling: You have several
> information modelled redundantly:
>
> You express the allow and disallow separately, which can lead to 4 states
> (allow/disallow being contradictory). It would be better to express it
> with a single property with 3 states (if you want to express: neither
> allowed or disallowed, allowed, disallowed) or with a single property with
> 2 states (allowed, disallowed). Also in the later case, you could just
> express it with the presence or absence of an attribute/subelement without
> caring about the value.
>
> Best regards
> Michael
>
> "VV" <VV@zg.htnet.hr> wrote in message news:doe62k$p0p$1@ss
405.t-com.hr...
>
>
|
|
|
|
|