Home > Archive > Sybase Database > March 2006 > Odd optimizer behavior









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Odd optimizer behavior
pokerdragon

2006-03-23, 8:23 pm

Hi all, hoping that someone with some optimizer knowledge can explain
this scenario.

ASE12.5.3 ESD #1, Solaris9, 64-bit. 7 Cpu, 24Gb Memory

Basic schema:

tableA - ~8 million rows. Relevant columns:
source_id (numeric(9,0) indexed)
key (varchar(15) indexed)

tableB - ~100,000 rows. Relevant columns:
ref_id (numeric(9,0) NOT indexed)

First let me clarify that we've solved the performance issue at hand by
adding an index to tableB.ref_id. I'm simply curious to see if anyone
can explain the optimizer's behavior under this scenario when
tableB.ref_id *isn't* indexed:

Query1:
select * from tableA, tableB where tableA.source_id = tableB.ref_id
and tableA.key like "002%"

Query2:
select * from tableA, tableB where tableA.source_id = tableB.ref_id
and tableA.key like "0029%"

Identical queries, except for the "like" clause.

My logic says that Query 2 will be faster because the like clause will
bring back fewer rows that need to be looked up on tableB. However,
Query2 takes about 8 minutes, where Query 1 takes only 30 seconds.

Key details of the showplans:
Query1 (... like "002%"):
-Spawns 3 worker processes
-Step1: Table scan on tableB into a worktable
-Step2: Index scan on tableA, then index scan from worktable
"parallel network buffer merge"
est. I/O: 893890

Query2 (... like "0029%"):
-No parallel exec / worker processes
-One Step: Index scan on tableA, table scan on tableB
est. I/O: 389762

"002%" brings back about 3100 rows from TableA, where "0029%" brings
back about 300.

If I had to guess, I'd suspect that, with Query1, it knows it's
bringing back more rows, and decides it's more efficient to build a
work table out of TableB instead of table scanning it 3000 times. But
why does it run so much faster than Query2? Why doesn't Query2 use
worker processes? I know the optimizer isn't perfect, but this is a
pretty simple query. I'm especially confused that Query 2's est. I/O
is 1/3 that of Query1, yet it runs much slower.

If anyone has any ideas on these I'd love to hear them. As I
mentioned, adding the missing index solved the performance problem we
experienced; I'd just like to learn more for my own reference about why
the optimizer chose the routes it did before we fixed things.

Thanks in advance for any thoughts!

-Mike

Mark A. Parsons

2006-03-23, 8:23 pm

I don't recall ... is 'est I/O' the estimated *physical* or *logical* io's?

Estimated I/O's are ok, but they're based on the stats available to the
optimizer ... what happens if your statistics are not up to date? or if
you have some skew with the stats on tableA?

- answer: the optimizer's *estimates* could be waaaay off from reality.

----------------------


Right off the bat I'd say your guess about the optimzer's actions are
correct ... a threshold is crossed whereby the optimizer flips between
REFORMATTING tableB and multiple table scans of tableB.

----------------------

Anyhoo, I'd normally look at a few more details ...

- dbcc traceon(3604,302,310
) - gives ya a better idea of why the optimizer
is doing what it's doing; also gives some insight as to the estimates it's
coming up with

- set statisics io on - gives you the actual number of times a table is
scanned plus the actual number of IO's


pokerdragon wrote:

> Hi all, hoping that someone with some optimizer knowledge can explain
> this scenario.
>
> ASE12.5.3 ESD #1, Solaris9, 64-bit. 7 Cpu, 24Gb Memory
>
> Basic schema:
>
> tableA - ~8 million rows. Relevant columns:
> source_id (numeric(9,0) indexed)
> key (varchar(15) indexed)
>
> tableB - ~100,000 rows. Relevant columns:
> ref_id (numeric(9,0) NOT indexed)
>
> First let me clarify that we've solved the performance issue at hand by
> adding an index to tableB.ref_id. I'm simply curious to see if anyone
> can explain the optimizer's behavior under this scenario when
> tableB.ref_id *isn't* indexed:
>
> Query1:
> select * from tableA, tableB where tableA.source_id = tableB.ref_id
> and tableA.key like "002%"
>
> Query2:
> select * from tableA, tableB where tableA.source_id = tableB.ref_id
> and tableA.key like "0029%"
>
> Identical queries, except for the "like" clause.
>
> My logic says that Query 2 will be faster because the like clause will
> bring back fewer rows that need to be looked up on tableB. However,
> Query2 takes about 8 minutes, where Query 1 takes only 30 seconds.
>
> Key details of the showplans:
> Query1 (... like "002%"):
> -Spawns 3 worker processes
> -Step1: Table scan on tableB into a worktable
> -Step2: Index scan on tableA, then index scan from worktable
> "parallel network buffer merge"
> est. I/O: 893890
>
> Query2 (... like "0029%"):
> -No parallel exec / worker processes
> -One Step: Index scan on tableA, table scan on tableB
> est. I/O: 389762
>
> "002%" brings back about 3100 rows from TableA, where "0029%" brings
> back about 300.
>
> If I had to guess, I'd suspect that, with Query1, it knows it's
> bringing back more rows, and decides it's more efficient to build a
> work table out of TableB instead of table scanning it 3000 times. But
> why does it run so much faster than Query2? Why doesn't Query2 use
> worker processes? I know the optimizer isn't perfect, but this is a
> pretty simple query. I'm especially confused that Query 2's est. I/O
> is 1/3 that of Query1, yet it runs much slower.
>
> If anyone has any ideas on these I'd love to hear them. As I
> mentioned, adding the missing index solved the performance problem we
> experienced; I'd just like to learn more for my own reference about why
> the optimizer chose the routes it did before we fixed things.
>
> Thanks in advance for any thoughts!
>
> -Mike
>

jan rosinowski

2006-03-24, 3:23 am

> Query1:
> select * from tableA, tableB where tableA.source_id = tableB.ref_id
>and tableA.key like "002%"
>
>Query2:
> select * from tableA, tableB where tableA.source_id = tableB.ref_id
>and tableA.key like "0029%"
>
>Identical queries, except for the "like" clause.
>
>My logic says that Query 2 will be faster because the like clause will
>bring back fewer rows that need to be looked up on tableB. However,
>Query2 takes about 8 minutes, where Query 1 takes only 30 seconds.
>
>Key details of the showplans:
>Query1 (... like "002%"):
> -Spawns 3 worker processes
> -Step1: Table scan on tableB into a worktable
> -Step2: Index scan on tableA, then index scan from worktable
> "parallel network buffer merge"
> est. I/O: 893890
>
>Query2 (... like "0029%"):
> -No parallel exec / worker processes
> -One Step: Index scan on tableA, table scan on tableB
> est. I/O: 389762
>


looks as if q2 would involve a table scan for every hit in a whereas
q1 does one scan and a couple of index lookups. after the first scan b
is probably in ram

jacques.dhondt@free.fr

2006-03-27, 7:23 am

What about the density of the key?

In one query you have got 3 chars while the other 4 chars.

As far as I remember, starting with Sybase 12.0, Thre is some
optimisation in the index, Id est the string stored in the non leaf
pages do not contain 'terminal' string.
You might have more pages to scan to determine if your key is present
in the index, and therefore a different QP.
No?

Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com