|
Home > Archive > SQL Anywhere Feedback > July 2005 > new PICK FROM ROW operator?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
new PICK FROM ROW operator?
|
|
| Erik Anderson 2005-07-14, 11:23 am |
| I'm hoping that someone will suprise me and say that something like this
already exists in the database or that there is some strange technique that
I hadn't heard of before. @@version = 9.0.2.3044, client(both senses) =
9.0.2.2551
I have been noticing that subqueries have crappy performance and have been
trying fairly hard to rewrite them out of the picture. For instance, the
following query:
SELECT parentObject.*,
(select sum(cost) from childObject where childObject.object =
parentObject.object)
from parentObject;
.... can be several hours slower than the following equivalent query ...
WITH costData AS (
select object, sum(cost) as cost
from childObject
group by object
)
select parentObject.*, costData.cost
from parentObject
left outer join costData on parentObject.object=costData.object;
I have recently been running into a number of queries of the following form:
select parentObject.*,
(select first ChangedBy from auditTrail where
auditTrail.object=parentObject.object order by DateChanged desc) as
LastChanged
from parentObject;
.... or even worse ...
select parentObject.*,
(select first ChangedBy from auditTrail where
auditTrail.object=parentObject.object order by DateChanged desc) as
LastChangedBy,
(select first Status from auditTrail where
auditTrail.object=parentObject.object order by DateChanged desc) as
LastStatus,
(select min(DateChanged) from auditTrail where
auditTrail.object=parentObject.object) as LastChangedDate
from parentObject;
Barring some unforseen way of rewriting this to eliminate the subqueries
(and no, caching the data on the parent object won't help, there might be
additional WHERE clauses on the audit trail restricting it to a specific
date range for instance), my dream rewrite would look something like this:
WITH auditData AS (
SELECT PICK FROM ROW auditTrail.* OVER (PARTITION BY object ORDER BY
DateChanged DESC)
)
select parentObject.*,
auditData.ChangedBy as LastChangedBy, auditData.Status as LastStatus,
auditData.DateChanged as LastChangedDate
from parentObject
left outer join auditData on parentObject.object = auditData.object
So (ahem) any takers?
| |
| Glenn Paulley 2005-07-14, 1:23 pm |
| "Erik Anderson" < erikba@teamworkgroup
.com> wrote in
news:42d69852@forums
-2-dub:
> WITH auditData AS (
> SELECT PICK FROM ROW auditTrail.* OVER (PARTITION BY object ORDER BY
> DateChanged DESC)
> )
> select parentObject.*,
> auditData.ChangedBy as LastChangedBy, auditData.Status as
> LastStatus,
> auditData.DateChanged as LastChangedDate
> from parentObject
> left outer join auditData on parentObject.object = auditData.object
9.0.2 does not support FIRST() and LAST() (we hope to in the Jasper
release), which will be real aggregate functions and therefore can make
this type of query quite efficient.
However, I believe the existing OLAP support in 9.0.2 can still permit
you to eliminate the subquery. What you can do is use the ROW_NUMBER
function to number each of the rows from auditTrail, sorted by
DateChanged, and then select out the row(s) with a row number of 1 (this
is obviously worse than if FIRST() was supported, as it would return a
single value). If you require both the first AND the last value in a
partition you'll have no choice but to use two WINDOWs (and hence incur
two sorts), but this may still be faster than using correlated
subqueries.
As an example of using row_number, here a query over the asademo CUSTOMER
table:
select *
from (select
fname, lname,
row_number() over( partition by state order by city) as first_row
from customer ) DT
where DT.first_row = 1
Glenn
> I'm hoping that someone will suprise me and say that something like
> this already exists in the database or that there is some strange
> technique that I hadn't heard of before. @@version = 9.0.2.3044,
> client(both senses) = 9.0.2.2551
>
> I have been noticing that subqueries have crappy performance and have
> been trying fairly hard to rewrite them out of the picture. For
> instance, the following query:
>
> SELECT parentObject.*,
> (select sum(cost) from childObject where childObject.object =
> parentObject.object)
> from parentObject;
>
> ... can be several hours slower than the following equivalent query
> ...
>
> WITH costData AS (
> select object, sum(cost) as cost
> from childObject
> group by object
> )
> select parentObject.*, costData.cost
> from parentObject
> left outer join costData on parentObject.object=costData.object;
>
> I have recently been running into a number of queries of the following
> form:
>
> select parentObject.*,
> (select first ChangedBy from auditTrail where
> auditTrail.object=parentObject.object order by DateChanged desc) as
> LastChanged
> from parentObject;
>
> ... or even worse ...
>
> select parentObject.*,
> (select first ChangedBy from auditTrail where
> auditTrail.object=parentObject.object order by DateChanged desc) as
> LastChangedBy,
> (select first Status from auditTrail where
> auditTrail.object=parentObject.object order by DateChanged desc) as
> LastStatus,
> (select min(DateChanged) from auditTrail where
> auditTrail.object=parentObject.object) as LastChangedDate
> from parentObject;
>
> Barring some unforseen way of rewriting this to eliminate the
> subqueries (and no, caching the data on the parent object won't help,
> there might be additional WHERE clauses on the audit trail restricting
> it to a specific date range for instance), my dream rewrite would look
> something like this:
>
> WITH auditData AS (
> SELECT PICK FROM ROW auditTrail.* OVER (PARTITION BY object ORDER BY
> DateChanged DESC)
> )
> select parentObject.*,
> auditData.ChangedBy as LastChangedBy, auditData.Status as
> LastStatus,
> auditData.DateChanged as LastChangedDate
> from parentObject
> left outer join auditData on parentObject.object = auditData.object
>
> So (ahem) any takers?
>
>
>
--
Glenn Paulley
Research and Development Manager, Query Processing
iAnywhere Solutions Engineering
EBF's and Patches: http://downloads.sybase.com
choose SQL Anywhere Studio >> change 'time frame' to all
To Submit Bug Reports: http://casexpress.sybase.com/cx/cx.stm
SQL Anywhere Studio Supported Platforms and Support Status
http://my.sybase.com/detail?id=1002288
| |
| David Kerber 2005-07-14, 1:23 pm |
| In article < Xns9693931FDA8AEpaul
leyianywherecom@10.22.241.106>,
paulley@ianywhere.com says...
> "Erik Anderson" < erikba@teamworkgroup
.com> wrote in
> news:42d69852@forums
-2-dub:
>
>
....
> select *
> from (select
> fname, lname,
> row_number() over( partition by state order by city) as first_row
> from customer ) DT
> where DT.first_row = 1
Oh, boy; I haven't tried them yet, but I think you just made some of my
date-based queries soooo much easier!
.....
--
Remove the ns_ from if replying by e-mail (but keep posts in the
newsgroups if possible).
|
|
|
|
|