Drop Table

Support Forum for database administrators and web based access to important newsgroups related to databases
Register on Database Support Forum Edit your profileCalendarFind other Database Support forum membersFrequently Asked QuestionsSearch this forum -> 
For Database admins: Free Database-related Magazines Now Free shipping to Texas


Post New Thread










Thread
Author

Cursor looping versus set-based queries
I know this question has been asked. And the usual answer is don't use
cursors or any other looping method. Instead, try to find a solution
that uses set-based queries.

But this brings up several questions / senarios:

* I created several stored procedures that take parameters and inserts
the data into the appropriate tables. This was done for easy access/use
from client side apps (i.e. web-based).

Proper development tactics says to try and do "code reuse". So, if I
already have stored procs that do my logic, should I be writing a
second way of handling the data? If I ever need to change the way the
data is handled, I now have to make the same change in two (or more)
places.

* Different data from the same row needs to be inserted into multiple
tables. "Common sense" (maybe "gut instinct" is better) says to handle
each row as a "unit". Seems weird to process the entire set for one
table, then to process the entire set AGAIN for another table, and then
YET AGAIN for a third table, and so on.

* Exception handling. Set based processing means that if one row fails
the entire set fails. Looping through allows you to fail a row but
allow everything else to be processed properly. It also allows you to
gather statistics. (How many failed, how many worked, how many were
skipped, etc.)

?? Good idea ?? The alternative is to create a temporary table (sandbox
or workspace type thing), copy the data to there along with "status" or
"valdation" columns, run through the set many times over looking for
any rows that may fail, marking them as such, and then at the end only
dealing with those rows which "passed" the testing. Of course, in order
for this to work you must know (and duplicate) all constraints so you
know what to look for in your testing.


Report this thread to moderator Post Follow-up to this message
Old Post
JayCallas@hotmail.com
03-29-06 01:26 AM


Re: Cursor looping versus set-based queries
Disclaimer: difficult to answer without more specific info.

JayCallas@hotmail.com  wrote:
> I know this question has been asked. And the usual answer is don't use
> cursors or any other looping method. Instead, try to find a solution
> that uses set-based queries.
>
> But this brings up several questions / senarios:
>
> * I created several stored procedures that take parameters and inserts
> the data into the appropriate tables. This was done for easy access/use
> from client side apps (i.e. web-based).
>
> Proper development tactics says to try and do "code reuse". So, if I
> already have stored procs that do my logic, should I be writing a
> second way of handling the data? If I ever need to change the way the
> data is handled, I now have to make the same change in two (or more)
> places.

If a SP encapsulates a plain INSERT I'd throw it out.  If there is more
complex logic involved I'd probable leave it in.

> * Different data from the same row needs to be inserted into multiple
> tables. "Common sense" (maybe "gut instinct" is better) says to handle
> each row as a "unit". Seems weird to process the entire set for one
> table, then to process the entire set AGAIN for another table, and then
> YET AGAIN for a third table, and so on.

This smells like a design issue.  If you have to insert the exact same
data into multiple tables chances are that your table layout is flawed.

> * Exception handling. Set based processing means that if one row fails
> the entire set fails. Looping through allows you to fail a row but
> allow everything else to be processed properly. It also allows you to
> gather statistics. (How many failed, how many worked, how many were
> skipped, etc.)

Normally you want all or nothing.  This one sounds as if you placed
business logic into to database.  This might or might not be a good idea
but there seems to be a chance that this is better done in a middle tier
(for example because it eases porting to another RDBMS).  But it depends
on the error handling and operations you do in SP's.

> ?? Good idea ?? The alternative is to create a temporary table (sandbox
> or workspace type thing), copy the data to there along with "status" or
> "valdation" columns, run through the set many times over looking for
> any rows that may fail, marking them as such, and then at the end only
> dealing with those rows which "passed" the testing. Of course, in order
> for this to work you must know (and duplicate) all constraints so you
> know what to look for in your testing.

I wouldn't recommend one or the other direction with so few info.

Kind regards

robert



Report this thread to moderator Post Follow-up to this message
Old Post
Robert Klemme
03-29-06 01:26 AM


Re: Cursor looping versus set-based queries
On 28 Mar 2006 12:06:31 -0800, JayCallas@hotmail.com wrote:

 (snip)
>Proper development tactics says to try and do "code reuse".

Hi Jay,

This is not always true when dealing with databases. You have to weigh
the benefit of code reuse (cheaper maintenance) against the benefit of
code duplication (better performance).

In databases where performance matters, I won't hesitate a single second
to duplicate a stored procedure and change it slightly to optimize for
specific situations.

>* Different data from the same row needs to be inserted into multiple
>tables. "Common sense" (maybe "gut instinct" is better) says to handle
>each row as a "unit". Seems weird to process the entire set for one
>table, then to process the entire set AGAIN for another table, and then
>YET AGAIN for a third table, and so on.

I agree with Robert's reaction about the validitiy of such a design. But
if we assume that this really is valid, then you'll probablly find the
cost of the second, third, etc. access to the data faster than the first
access, because the data can all be fetched from cache and no physical
disk I/O is needed.

Of course, you can always test both versions against each other - I
think that you'll be hard-pressed to find a scenario where using a
cursor outperforms using the same SELECT statement two or three times in
a row.

>* Exception handling. Set based processing means that if one row fails
>the entire set fails. Looping through allows you to fail a row but
>allow everything else to be processed properly. It also allows you to
>gather statistics. (How many failed, how many worked, how many were
>skipped, etc.)

In an RDBMS, transactions should be atomic (the A in the ACID properties
of transactions) - they either succeed as a whole, or they fail as a
whole.

If you need to exclude rows that would violate a constraint, check the
constraint in the WHERE clause.

> The alternative is to create a temporary table (sandbox
>or workspace type thing), copy the data to there along with "status" or
>"valdation" columns, run through the set many times over looking for
>any rows that may fail, marking them as such,

No need to loop over the data many times. In most cases, you only need a
single UPDATE with a CASE to check the variuous constraints and set the
"status" column accordingly.

> and then at the end only
>dealing with those rows which "passed" the testing. Of course, in order
>for this to work you must know (and duplicate) all constraints so you
>know what to look for in your testing.

Yes. You need to know your constraints.

Isn't that a normal part of your job?

--
Hugo Kornelis, SQL Server MVP

Report this thread to moderator Post Follow-up to this message
Old Post
Hugo Kornelis
03-29-06 01:26 AM


Re: Cursor looping versus set-based queries
JayCallas@hotmail.com  wrote:
> I know this question has been asked. And the usual answer is don't use
> cursors or any other looping method. Instead, try to find a solution
> that uses set-based queries.
>
> But this brings up several questions / senarios:
>
> * I created several stored procedures that take parameters and inserts
> the data into the appropriate tables. This was done for easy access/use
> from client side apps (i.e. web-based).
>
> Proper development tactics says to try and do "code reuse". So, if I
> already have stored procs that do my logic, should I be writing a
> second way of handling the data? If I ever need to change the way the
> data is handled, I now have to make the same change in two (or more)
> places.
>
> * Different data from the same row needs to be inserted into multiple
> tables. "Common sense" (maybe "gut instinct" is better) says to handle
> each row as a "unit". Seems weird to process the entire set for one
> table, then to process the entire set AGAIN for another table, and then
> YET AGAIN for a third table, and so on.
>
> * Exception handling. Set based processing means that if one row fails
> the entire set fails. Looping through allows you to fail a row but
> allow everything else to be processed properly. It also allows you to
> gather statistics. (How many failed, how many worked, how many were
> skipped, etc.)
>
> ?? Good idea ?? The alternative is to create a temporary table (sandbox
> or workspace type thing), copy the data to there along with "status" or
> "valdation" columns, run through the set many times over looking for
> any rows that may fail, marking them as such, and then at the end only
> dealing with those rows which "passed" the testing. Of course, in order
> for this to work you must know (and duplicate) all constraints so you
> know what to look for in your testing.


Another reason why code re-use is less of an issue in SQL compared to
general programming languages is that it's so easy to generate scripts
automatically from your database's metadata. That's particularly the
case for CRUD scripts, transformation scripts, archiving, auditing,
etc. If you have good constraints and naming conventions then you can
automate the production of thousands of lines of code in seconds. That
maybe not as important as Robert and Hugo's other sensible comments but
it is a distinct advantage.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--


Report this thread to moderator Post Follow-up to this message
Old Post
David Portas
03-29-06 01:26 AM


Re: Cursor looping versus set-based queries
Thanks for replying. As you probably guessed, there are specific
reasons why I am asking these questions. Maybe a bit more info would be
helpful.

* I am not talking about the same data being written to multiple
tables. I referring to situations where different columns from the same
row are written to different tables.

A concrete example would be security (stocks) data. There are different
types of securities. (Equities and Options are just two of them). Both
types of securities share common information like symbol, name, and
security type. But options have additional data (strike price, class,
and expiration).

My current schema has a table named SecurityMaster where common data
goes with the primary key being Symbol. I also have a
 SecurityMasterDeriva
tive table where I put Option specific data which
has a foreign key reference to the SecurityMaster table.

So when I get a new security master file, the set-based approach says
to insert all the new common security data into the SecurityMaster
table and then go back and and reprocess the new data to insert the
option specific data. My dilemma is that I feel it is more "natural" to
treat each row as a single entity.

* The stored procedure approach I mentioned in the first point is
related to the example I just gave. When inserting a new security into
the "Security Master database" (I do not mean the single table), there
is a at least one table involved but possibly up to 4 tables depending
on the type of security that we are dealing with.

* As far as exception handling goes, why is it considered "normal" to
want all the data or just some of it? What about the idea of getting
"most" of the data in and then dealing with those "exceptions"? Is it
considered "business logic" to say "I rather have the majority of the
data as opposed to NONE of the data"?

I understand that I can make one large INSERT query which also performs
ALL the required checks in order to filter out "bad" rows. But I would
think that the query could get very complex very quickly. I regularly
deal with situations where the data can not only NOT be trusted to be
complete but where column names and formats are VERY different even for
"similar" information.

I find that I need to "backfill" a lot of data or perform a good number
of value conversions.

One example that comes to mind of using a single query resulted in code
that took 25+ minutes ro run. In this situation I was dealing with
position data for securities.

For Equities, I trusted the CUSIP value they gave me but not the
symbol. So I decided to "verify" the symbol against what I knew it as
(ie. backfill the column). But it was not enough to just do a left
join. Because it was possible to have the CUSIP repeated in the lookup
table (the PK was CUSIP and Symbol). So my Symbol lookup logic
(remember this was a single query), was to see if there was only one
row with the given cusip, then check to see if there was only one
active security row with the given cusip, THEN check to see if there
was only one north american active security with the given cusip,
otherwise use what the source gave us.

And that was only for equities... For options the logic is 100%
opposite... Talk about complex queries that SUCK in performance...

The reason why I suggested the multiple pass validation approach was to
be able to record or mark not only those rows that failed but WHY they
failed. (Assume that there is a [ValidationCode] and [ValidationReas
on]
column in the "sandbox" table.) I need to be able to create an
exception report so somebody can go back and "fix" the data.

-----

I guess the bottom line is that I am probably trying to apply my many
years of application development approaches and opinions to database
development and the two do not mesh.


Report this thread to moderator Post Follow-up to this message
Old Post
JayCallas@hotmail.com
03-29-06 01:26 AM


Re: Cursor looping versus set-based queries
(JayCallas@hotmail.com)  writes:
> * I created several stored procedures that take parameters and inserts
> the data into the appropriate tables. This was done for easy access/use
> from client side apps (i.e. web-based).
>
> Proper development tactics says to try and do "code reuse". So, if I
> already have stored procs that do my logic, should I be writing a
> second way of handling the data? If I ever need to change the way the
> data is handled, I now have to make the same change in two (or more)
> places.

Undeniably, this is surely a common reason why people end up with
cursors. They already have a procedure that performs an operation on
a scalar set of data, and now they need to do it multiple times. We
have plenty of such examples in our database.

When you face this situation you have a tradeoff: speed up development
and run a cursor and take the performance cost. Or rewrite that
stored procedure so that it deals with data in a table. Typically
when we do this, we use a temp table for input or more often a process-
keyed table. (See http://www.sommarskog.se/share_data.html#usingtable for
a discussion on this.) You keep the old scalar procedure, but to not
duplicate logic, you make it a wrapper on the set-based procedure.

Rewriting a scalar procedure into set-based is not a trivial task,
not the least when you code with performance in mind. And not the least
that at least during a transitional period there will be loops that
call the set-based procedure for one row at a time. There is certainly
an overhead of getting data from a table rather than from parameters.

So there is definitely a trade-off here. But if a loop today is the
best bet, it may not be tomorrow, because as the data grows in size,
performance becomes an issue.

I should add that I'm here talking of procedure where logic is really
complex. As the others have said, code reuse is not equally much a
virtue in SQL as it is application code.

> * Different data from the same row needs to be inserted into multiple
> tables. "Common sense" (maybe "gut instinct" is better) says to handle
> each row as a "unit". Seems weird to process the entire set for one
> table, then to process the entire set AGAIN for another table, and then
> YET AGAIN for a third table, and so on.

I'm not sure that I understand this point. As I mentioned, I usually
have my input to my set-based procedures in a table, so I don't really
see the problem.

> * Exception handling. Set based processing means that if one row fails
> the entire set fails. Looping through allows you to fail a row but
> allow everything else to be processed properly. It also allows you to
> gather statistics. (How many failed, how many worked, how many were
> skipped, etc.)

Yes, this is a point that is often over-looked. Judging from some of
the other replies, some people appears to prefer ignore this issue,
but depending on your business requirement and the likelyhood for
errors, this may be a decisive reason to run a cursor.

For instance, I recently rewrote a stored procedure in our system
which makes a contract note definitive. A stock broker have thousands
of contract notes every day. Typically customer notes stay open all
day, but at the end of the day they should all be made definitive. In
conjunction with this there are a number of updates to perform. With
the old procedure, the notes were handled one by one, and if an occasional
note failed, that was no disaster. It could be sorted out the next day.

For the new procedure, I do perform some initial validations, and notes
that fails these validations will not be updated whereas the rest
will be. But from the point that I've started to perform updates, there
is not really an easy way out if there is some problem with a single note.
That means that the entire operation will fail. In our case, this should
only happen exceptionally, but if it happens the customer be in dire
straits, not the least that this will typically happen after hours.
(Please don't suggest that I should validate everything in advance.
We're not only talking constraints, but assertions in sub-procedures,
whereof some are very complex.)

A possible fall back in this case, that I have implemented, but maybe
I should would be that if there is an error, I would then start to take
the notes one-by-one.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Report this thread to moderator Post Follow-up to this message
Old Post
Erland Sommarskog
03-29-06 01:26 AM


Re: Cursor looping versus set-based queries
(JayCallas@hotmail.com)  writes:
> So when I get a new security master file, the set-based approach says
> to insert all the new common security data into the SecurityMaster
> table and then go back and and reprocess the new data to insert the
> option specific data. My dilemma is that I feel it is more "natural" to
> treat each row as a single entity.

That is a feeling that you should overcome. It's perfectly normal to
read the same row all over again. (Maybe one day we will get a multi-table
INSERT, but I am not holding my breath.)

> * As far as exception handling goes, why is it considered "normal" to
> want all the data or just some of it? What about the idea of getting
> "most" of the data in and then dealing with those "exceptions"? Is it
> considered "business logic" to say "I rather have the majority of the
> data as opposed to NONE of the data"?

Of course, sometimes that is the business rule: all or nothing. But
there are also lots of processes where it's OK that some data slips
through the crack, as long what is a unit is a unit. (You don't want
an option to be imported into SecurityMaster, but then not getting the
option-specific data in place. Then you rather lose it.)

In the case where you import data, validation errors may be anticipated,
and you could try to check for the most likely in advance. In the end
what matters is of course, if the performance for the solution you
have is acceptable or not.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Report this thread to moderator Post Follow-up to this message
Old Post
Erland Sommarskog
03-29-06 01:26 AM


Re: Cursor looping versus set-based queries
Erland Sommarskog  wrote: 
>
> I'm not sure that I understand this point. As I mentioned, I usually
> have my input to my set-based procedures in a table, so I don't really
> see the problem.

I think what Jay is up to here is mapping of an OO model to a relational
model.  There is one "master" table which represents the root class of
his hierarchy and several tables for specific types.  I've never worked
with user defined types but maybe these are an option here as well.

Kind regards

robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Klemme
03-29-06 08:27 AM


Re: Cursor looping versus set-based queries
Robert Klemme (bob.news@gmx.net)  writes:
> Erland Sommarskog wrote: 
>
> I think what Jay is up to here is mapping of an OO model to a relational
> model.  There is one "master" table which represents the root class of
> his hierarchy and several tables for specific types.  I've never worked
> with user defined types but maybe these are an option here as well.

I suppose you are thinking of the CLR UDTs added to SQL 2005. No, they are
not an option here. CLR UDTs are for small values that always go together,
and where you rarely would search for the individual values. Personally, I
see CLR UDTs as a fairly marginal benefit.

As for the structure of Jay's tables, I have no problem to understand it,
as I work with securities myself, and different classes of securities
have different properties.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Report this thread to moderator Post Follow-up to this message
Old Post
Erland Sommarskog
03-29-06 08:27 AM


Re: Cursor looping versus set-based queries
Erland Sommarskog  wrote:
> Robert Klemme (bob.news@gmx.net) writes: 
>
> I suppose you are thinking of the CLR UDTs added to SQL 2005. No, they are
> not an option here. CLR UDTs are for small values that always go together,
> and where you rarely would search for the individual values. Personally, I
> see CLR UDTs as a fairly marginal benefit.

Ah, good to know.

> As for the structure of Jay's tables, I have no problem to understand it,
> as I work with securities myself, and different classes of securities
> have different properties.

Then I probably misunderstood your posting.  Sorry for the noise.

Kind regards

robert

Report this thread to moderator Post Follow-up to this message
Old Post
Robert Klemme
03-29-06 08:27 AM


Sponsored Links





Last Thread Next Thread
Pages (2): [1] 2 »
Post New Thread

Microsoft SQL Server forum archive

Show a Printable Version Email This Page to Someone! Receive updates to this thread
Microsoft SQL Server
Access database support
PostgreSQL Replication
SQL Server ODBC
FoxPro Support
PostgreSQL pgAdmin
SQL Server Clustering
MySQL ODBC
Web Applications with dBASE
SQL Server CE
MySQL++
Sybase Database Support
MS SQL Full Text Search
PostgreSQL Administration
SQL Anywhere support
DB2 UDB Database
Paradox Database Support
Filemaker Database
Berkley DB
SQL 2000/2000i database
ASE Database
Forum Jump:
All times are GMT. The time now is 08:23 PM.

 
Mobile devices forum | Database support forum archive




Copyrights DropTable.com Database Support Forum 2004 - 2006