|
Home > Archive > MS SQL XML > October 2005 > Store and query a lot of small pieces of xml
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Store and query a lot of small pieces of xml
|
|
| fhtino 2005-10-03, 7:24 am |
| Hello,
I need to store informations about a lot of documents (files of file
system). Every document has related informations (metadata): Author,
CreationDate, etc.
But every document has different related informations: Invoices has
invoice_number, date and customerID, Fax has fromNumber and ToNumber,
etc.
My idea is to store a small piece of XML for each document in a xml
field:
Table Docs:
ID int (PK)
Name nvarchar
MetaInfo xml
Xml example:
<MAIN>
<ID>1993</ID>
<Edit>NewYork984</Edit>
<Fatt>984</Fatt>
<DE>30/09/2005 17.34.58</DE>
<recipe>
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
<ingredient name="beef cube steak"
amount="1.5" unit="pound" />
<ingredient name="onion, sliced into thin rings"
amount="1" />
</recipe>
</MAIN>
Another examples:
<MAIN>
<invoice_number>456123</invoice_number>
<customerid>GH6798</customerid>
</MAIN>
In general all the xml fragment are different from each other.
Now, if I want all the document with Fatt > 980, I can get them using:
SELECT ID, MetaInfo FROM Docs
WHERE MetaInfo.exist('/MAIN/Fatt[.>980]') = 1
It works fine but the performance are very bad: about 4 second to query
a 50000 record table.
How can I improve performance? Or is there another approach to the
problem? In the future I need to manage a lot of documents... 1 milion
or more. Am I on the wrong way?
Thank you for help
Best regards
| |
| fhtino 2005-10-27, 9:25 am |
|
....I have forgotten to say that my sql environment is SQL 2005 CTP.
| |
| Michael Rys [MSFT] 2005-10-27, 9:25 am |
| Have you defined any primary and secondary XML indices on your MetaInfo
column?
For some optimization techniques, please check out the following whitepaper:
http://msdn.microsoft.com/library/e...l/sqloptxml.asp
Best regards
Michael
"fhtino" <fhtino@gmail.com> wrote in message
news:1128341983.260807.227730@g49g2000cwa.googlegroups.com...
> Hello,
>
> I need to store informations about a lot of documents (files of file
> system). Every document has related informations (metadata): Author,
> CreationDate, etc.
> But every document has different related informations: Invoices has
> invoice_number, date and customerID, Fax has fromNumber and ToNumber,
> etc.
>
> My idea is to store a small piece of XML for each document in a xml
> field:
>
> Table Docs:
> ID int (PK)
> Name nvarchar
> MetaInfo xml
>
>
> Xml example:
>
> <MAIN>
> <ID>1993</ID>
> <Edit>NewYork984</Edit>
> <Fatt>984</Fatt>
> <DE>30/09/2005 17.34.58</DE>
> <recipe>
> <title>Beef Parmesan with Garlic Angel Hair Pasta</title>
> <ingredient name="beef cube steak"
> amount="1.5" unit="pound" />
> <ingredient name="onion, sliced into thin rings"
> amount="1" />
> </recipe>
> </MAIN>
>
>
> Another examples:
>
> <MAIN>
> <invoice_number>456123</invoice_number>
> <customerid>GH6798</customerid>
> </MAIN>
>
>
> In general all the xml fragment are different from each other.
>
>
> Now, if I want all the document with Fatt > 980, I can get them using:
> SELECT ID, MetaInfo FROM Docs
> WHERE MetaInfo.exist('/MAIN/Fatt[.>980]') = 1
>
>
> It works fine but the performance are very bad: about 4 second to query
> a 50000 record table.
>
> How can I improve performance? Or is there another approach to the
> problem? In the future I need to manage a lot of documents... 1 milion
> or more. Am I on the wrong way?
>
>
> Thank you for help
>
>
> Best regards
>
| |
| fhtino 2005-10-27, 9:25 am |
| Michael Rys [MSFT] wrote:
>
> Have you defined any primary and secondary XML indices on your MetaInfo
> column?
Yes. I've tested all the combinations: no index, only primary, primary
+ secondary_value, primary + secondary_path, etc.
In general adding indexes increases performances but the maxiumum I got
is not so good... :-(
Other ideas?
Regards
Fabrizio
| |
| Michael Rys [MSFT] 2005-10-27, 9:25 am |
| What CTP version are you using?
Also, is the data typed, so that it knows that the Fatt element is a number?
Best regards
Michael
"fhtino" <fhtino@gmail.com> wrote in message
news:1128407312.065063.212800@g14g2000cwa.googlegroups.com...
> Michael Rys [MSFT] wrote:
>
> Yes. I've tested all the combinations: no index, only primary, primary
> + secondary_value, primary + secondary_path, etc.
> In general adding indexes increases performances but the maxiumum I got
> is not so good... :-(
>
> Other ideas?
>
>
> Regards
>
>
> Fabrizio
>
| |
| fhtino 2005-10-27, 9:25 am |
| Michael Rys [MSFT] wrote:
>
> What CTP version are you using?
I think it's the June CTP... at the moment I can't verify.
> Also, is the data typed, so that it knows that the Fatt element is a number?
No. XML is not typed and SQL does not know that Fatt is a number. But I
can't "type" the Xml: every document has its own set of informations
(i.e. xml tags).
Regards
| |
| Michael Rys [MSFT] 2005-10-27, 9:25 am |
| Thanks. One of the issues may be that since you have untyped data the
implied cast is making the optimizations not pick up the index.
Would it be possible to send me a ZIP file with a small repro (data and TSQL
script)? This way, we can take a look to see what may be the cause.
Thanks
Michael
"fhtino" <fhtino@gmail.com> wrote in message
news:1128518139.799015.229690@g43g2000cwa.googlegroups.com...
> Michael Rys [MSFT] wrote:
>
> I think it's the June CTP... at the moment I can't verify.
>
>
>
> No. XML is not typed and SQL does not know that Fatt is a number. But I
> can't "type" the Xml: every document has its own set of informations
> (i.e. xml tags).
>
>
> Regards
>
|
|
|
|
|