Home > Archive > SQL Anywhere ultralite > October 2005 > General DISTINCT question









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author General DISTINCT question
rr12

2005-10-27, 7:41 am

How does DISTINCT decide which items to omit and is there a way to control
it?


Breck Carter [TeamSybase]

2005-10-27, 7:41 am

DISTINCT looks at the entire SELECT list when looking for duplicate
rows to eliminate.

So, when you say "SELECT DISTINCT a, b FROM t" you are not saying
"give me all the different values of a", although that's what it looks
like :)

In fact, you are saying "give me all the different values of a *and*
b"... eliminate all the duplicate rows with the same values of a and
b.

I don't know about UltraLite, but AFAIK in ASA the DISTINCT clause
often forces the entire result set to be sorted, on all columns in the
SELECT list, so it can find and remove duplicates.

That means if you have a bug in your SELECT that generates unwanted
duplicate rows, adding DISTINCT will make the problem go away but
performance may suffer... fixing the bug may be the better way.

Does that help?

Breck

On 18 Oct 2005 07:54:42 -0700, "rr12" <abc@123.com> wrote:

>How does DISTINCT decide which items to omit and is there a way to control
>it?
>


--
SQL Anywhere Studio 9 Developer's Guide
Buy the book: http://www.amazon.com/exec/obidos/A...7/risingroad-20
bcarter@risingroad.com
RisingRoad SQL Anywhere and MobiLink Professional Services
www.risingroad.com
rr12

2005-10-27, 7:41 am

Are you saying that once the sort is done it keeps the first record of 'n'
duplicate records and then omits the rest. Is ORDER BY involved or just the
select list?

And is this something I can depend upon; or would I be just guessing and
maybe getting lucky?

"Breck Carter [TeamSybase]" < NOSPAM__bcarter@risi
ngroad.com> wrote in
message news:v94al1l38r0a30s
jo41fk98qh9388822lf@
4ax.com...
> DISTINCT looks at the entire SELECT list when looking for duplicate
> rows to eliminate.
>
> So, when you say "SELECT DISTINCT a, b FROM t" you are not saying
> "give me all the different values of a", although that's what it looks
> like :)
>
> In fact, you are saying "give me all the different values of a *and*
> b"... eliminate all the duplicate rows with the same values of a and
> b.
>
> I don't know about UltraLite, but AFAIK in ASA the DISTINCT clause
> often forces the entire result set to be sorted, on all columns in the
> SELECT list, so it can find and remove duplicates.
>
> That means if you have a bug in your SELECT that generates unwanted
> duplicate rows, adding DISTINCT will make the problem go away but
> performance may suffer... fixing the bug may be the better way.
>
> Does that help?
>
> Breck
>
> On 18 Oct 2005 07:54:42 -0700, "rr12" <abc@123.com> wrote:
>
>
> --
> SQL Anywhere Studio 9 Developer's Guide
> Buy the book:
> http://www.amazon.com/exec/obidos/A...7/risingroad-20
> bcarter@risingroad.com
> RisingRoad SQL Anywhere and MobiLink Professional Services
> www.risingroad.com



Breck Carter [TeamSybase]

2005-10-27, 7:41 am

On 18 Oct 2005 09:10:18 -0700, "rr12" <abc@123.com> wrote:

>Are you saying that once the sort is done


The sort is something that may or may not be done, now or in the
future; the engine is free to use whatever technique it wants to. All
it *must* do is eliminate duplicate rows. Today, I think it uses a
sort if it has to.

But if you are thinking about the "logical execution of a SELECT",
forget I ever mentioned "sort".

>it keeps the first record of 'n'
>duplicate records and then omits the rest.


That part is right... although "first" and "the rest" implies some
kind of ordering, which is not applicable to a group of rows that are
identical.

So, it keeps one row.

>Is ORDER BY involved or just the
>select list?


It has nothing to do with ORDER BY. Understand that you can ORDER BY a
column that isn't even in the SELECT list. DISTINCT eliminates
duplicate rows in the result set.

>And is this something I can depend upon;


Yes, DISTINCT works.

>or would I be just guessing and
>maybe getting lucky?


No "maybe" about it, take it to Vegas :)

Breck


>
>"Breck Carter [TeamSybase]" < NOSPAM__bcarter@risi
ngroad.com> wrote in
>message news:v94al1l38r0a30s
jo41fk98qh9388822lf@
4ax.com...
>


--
SQL Anywhere Studio 9 Developer's Guide
Buy the book: http://www.amazon.com/exec/obidos/A...7/risingroad-20
bcarter@risingroad.com
RisingRoad SQL Anywhere and MobiLink Professional Services
www.risingroad.com
Greg Fenton

2005-10-27, 7:41 am

rr12 wrote:
> Are you saying that once the sort is done it keeps the first record of 'n'
> duplicate records and then omits the rest. Is ORDER BY involved or just the
> select list?
>
> And is this something I can depend upon; or would I be just guessing and
> maybe getting lucky?
>


I may be misreading this, but it seems that you are asking about *which*
row is returned and which are dropped from the result set.

The fact is, as Breck points out, *which* row is returned is irrelevant
as they are all identical. Only one gets returned. There is *no* way
to distinguish them, otherwise they would not be DISTINCT.

Is there something you are after? What is it you are trying to "depend
upon"?

The only contract that DISTINCT offers is that each row in the result
set will be entirely different from all other rows in the result set.
If you include some unique constraint in your result set (such as the
primary keys of all tables involved in the SELECT), then DISTINCT is no
longer necessary and might be causing UL to do more work than it has to
(e.g. sorting and checking for duplicates that could not possibly exist).

greg.fenton
--
Greg Fenton
Consultant, Solution Services, iAnywhere Solutions
--------
Visit the iAnywhere Solutions Developer Community
Whitepapers, TechDocs, Downloads
http://www.ianywhere.com/developer/
rr12

2005-10-27, 7:41 am

Thanks. I get it now. The problem I have is that UL does not work
correctly when using DISTINCT + LEFT OUTER JOIN; which was causing me to
make some assumptions about DISTINCT which were incorrect.

Can anyone duplicate the problem I'm having? Try using DISTINCT with a
basic LEFT OUTER JOIN and change the order of the columns in the select
statement. What you will see (I think) is that only the first column listed
is made DISTINCT instead of the entire result set. This ends up removing
rows which actually are different. I've tested identical statements on
dbisql and ulisql and the dbisql works fine and ulisql does not.

rr12

"Greg Fenton" <greg. fenton_NOSPAM_@ianyw
here.com> wrote in message
news:43553f4f$1@foru
ms-1-dub...
> rr12 wrote:
>
> I may be misreading this, but it seems that you are asking about *which*
> row is returned and which are dropped from the result set.
>
> The fact is, as Breck points out, *which* row is returned is irrelevant as
> they are all identical. Only one gets returned. There is *no* way to
> distinguish them, otherwise they would not be DISTINCT.
>
> Is there something you are after? What is it you are trying to "depend
> upon"?
>
> The only contract that DISTINCT offers is that each row in the result set
> will be entirely different from all other rows in the result set. If you
> include some unique constraint in your result set (such as the primary
> keys of all tables involved in the SELECT), then DISTINCT is no longer
> necessary and might be causing UL to do more work than it has to (e.g.
> sorting and checking for duplicates that could not possibly exist).
>
> greg.fenton
> --
> Greg Fenton
> Consultant, Solution Services, iAnywhere Solutions
> --------
> Visit the iAnywhere Solutions Developer Community
> Whitepapers, TechDocs, Downloads
> http://www.ianywhere.com/developer/



David Fishburn

2005-10-27, 7:41 am

"rr12" <abc@123.com> wrote in news:4356b738$1@foru
ms-2-dub of
sybase.public.sqlanywhere.ultralite:

rr12> Thanks. I get it now. The problem I have is that UL does not work
rr12> correctly when using DISTINCT + LEFT OUTER JOIN; which was causing
me
rr12> to make some assumptions about DISTINCT which were incorrect.
rr12>
rr12> Can anyone duplicate the problem I'm having? Try using DISTINCT
with
rr12> a basic LEFT OUTER JOIN and change the order of the columns in the
rr12> select statement. What you will see (I think) is that only the
first
rr12> column listed is made DISTINCT instead of the entire result set.
This
rr12> ends up removing rows which actually are different. I've tested
rr12> identical statements on dbisql and ulisql and the dbisql works fine
rr12> and ulisql does not.

Can you craft up a sample using the tables and data from the CustDB or
ASADemo database. Then we can look into it.

--
David Fishburn
Certified ASA Developer Version 8
iAnywhere Solutions - Sybase
Professional Services
Please only post to the newsgroup
Please ALWAYS include version and MORE importantly BUILD number with
EACH post (dbeng9 -v).

EBFs and Maintenance Releases
http://downloads.sybase.com/swx/sdmain.stm

Developer Community / Whitepapers
http://www.ianywhere.com/developer

CaseXpress - to report bugs
http://casexpress.sybase.com

CodeXchange - Free samples
[url]http://ianywhere.codexchange.sybase.com/servlets/ ProjectDocumentList[
/url]

Breck Carter [TeamSybase]

2005-10-27, 7:41 am

If you don't get a response in 24 hours, try this:

- show a complete reproducible, with a CREATE TABLE, INSERT commands,
plus two SELECTs, one that works and one that does not work.

- post it as a new message, not a reply... some busy people do not
read messages with replies because they think it has been dealt with,
and you have a new question (which I would love to help with, but I am
UltraLiteChallenged :)

Breck

On 19 Oct 2005 14:18:24 -0700, "rr12" <abc@123.com> wrote:

>Thanks. I get it now. The problem I have is that UL does not work
>correctly when using DISTINCT + LEFT OUTER JOIN; which was causing me to
>make some assumptions about DISTINCT which were incorrect.
>
>Can anyone duplicate the problem I'm having? Try using DISTINCT with a
>basic LEFT OUTER JOIN and change the order of the columns in the select
>statement. What you will see (I think) is that only the first column listed
>is made DISTINCT instead of the entire result set. This ends up removing
>rows which actually are different. I've tested identical statements on
>dbisql and ulisql and the dbisql works fine and ulisql does not.
>
>rr12
>
>"Greg Fenton" <greg. fenton_NOSPAM_@ianyw
here.com> wrote in message
> news:43553f4f$1@foru
ms-1-dub...
>


--
SQL Anywhere Studio 9 Developer's Guide
Buy the book: http://www.amazon.com/exec/obidos/A...7/risingroad-20
bcarter@risingroad.com
RisingRoad SQL Anywhere and MobiLink Professional Services
www.risingroad.com
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com