Drop Table
Support Forum for database administrators and web based access to important newsgroups related to databasesI know this question has been asked. And the usual answer is don't use cursors or any other looping method. Instead, try to find a solution that uses set-based queries. But this brings up several questions / senarios: * I created several stored procedures that take parameters and inserts the data into the appropriate tables. This was done for easy access/use from client side apps (i.e. web-based). Proper development tactics says to try and do "code reuse". So, if I already have stored procs that do my logic, should I be writing a second way of handling the data? If I ever need to change the way the data is handled, I now have to make the same change in two (or more) places. * Different data from the same row needs to be inserted into multiple tables. "Common sense" (maybe "gut instinct" is better) says to handle each row as a "unit". Seems weird to process the entire set for one table, then to process the entire set AGAIN for another table, and then YET AGAIN for a third table, and so on. * Exception handling. Set based processing means that if one row fails the entire set fails. Looping through allows you to fail a row but allow everything else to be processed properly. It also allows you to gather statistics. (How many failed, how many worked, how many were skipped, etc.) ?? Good idea ?? The alternative is to create a temporary table (sandbox or workspace type thing), copy the data to there along with "status" or "valdation" columns, run through the set many times over looking for any rows that may fail, marking them as such, and then at the end only dealing with those rows which "passed" the testing. Of course, in order for this to work you must know (and duplicate) all constraints so you know what to look for in your testing.
Post Follow-up to this messageDisclaimer: difficult to answer without more specific info. JayCallas@hotmail.com wrote: > I know this question has been asked. And the usual answer is don't use > cursors or any other looping method. Instead, try to find a solution > that uses set-based queries. > > But this brings up several questions / senarios: > > * I created several stored procedures that take parameters and inserts > the data into the appropriate tables. This was done for easy access/use > from client side apps (i.e. web-based). > > Proper development tactics says to try and do "code reuse". So, if I > already have stored procs that do my logic, should I be writing a > second way of handling the data? If I ever need to change the way the > data is handled, I now have to make the same change in two (or more) > places. If a SP encapsulates a plain INSERT I'd throw it out. If there is more complex logic involved I'd probable leave it in. > * Different data from the same row needs to be inserted into multiple > tables. "Common sense" (maybe "gut instinct" is better) says to handle > each row as a "unit". Seems weird to process the entire set for one > table, then to process the entire set AGAIN for another table, and then > YET AGAIN for a third table, and so on. This smells like a design issue. If you have to insert the exact same data into multiple tables chances are that your table layout is flawed. > * Exception handling. Set based processing means that if one row fails > the entire set fails. Looping through allows you to fail a row but > allow everything else to be processed properly. It also allows you to > gather statistics. (How many failed, how many worked, how many were > skipped, etc.) Normally you want all or nothing. This one sounds as if you placed business logic into to database. This might or might not be a good idea but there seems to be a chance that this is better done in a middle tier (for example because it eases porting to another RDBMS). But it depends on the error handling and operations you do in SP's. > ?? Good idea ?? The alternative is to create a temporary table (sandbox > or workspace type thing), copy the data to there along with "status" or > "valdation" columns, run through the set many times over looking for > any rows that may fail, marking them as such, and then at the end only > dealing with those rows which "passed" the testing. Of course, in order > for this to work you must know (and duplicate) all constraints so you > know what to look for in your testing. I wouldn't recommend one or the other direction with so few info. Kind regards robert
Post Follow-up to this messageOn 28 Mar 2006 12:06:31 -0800, JayCallas@hotmail.com wrote: (snip) >Proper development tactics says to try and do "code reuse". Hi Jay, This is not always true when dealing with databases. You have to weigh the benefit of code reuse (cheaper maintenance) against the benefit of code duplication (better performance). In databases where performance matters, I won't hesitate a single second to duplicate a stored procedure and change it slightly to optimize for specific situations. >* Different data from the same row needs to be inserted into multiple >tables. "Common sense" (maybe "gut instinct" is better) says to handle >each row as a "unit". Seems weird to process the entire set for one >table, then to process the entire set AGAIN for another table, and then >YET AGAIN for a third table, and so on. I agree with Robert's reaction about the validitiy of such a design. But if we assume that this really is valid, then you'll probablly find the cost of the second, third, etc. access to the data faster than the first access, because the data can all be fetched from cache and no physical disk I/O is needed. Of course, you can always test both versions against each other - I think that you'll be hard-pressed to find a scenario where using a cursor outperforms using the same SELECT statement two or three times in a row. >* Exception handling. Set based processing means that if one row fails >the entire set fails. Looping through allows you to fail a row but >allow everything else to be processed properly. It also allows you to >gather statistics. (How many failed, how many worked, how many were >skipped, etc.) In an RDBMS, transactions should be atomic (the A in the ACID properties of transactions) - they either succeed as a whole, or they fail as a whole. If you need to exclude rows that would violate a constraint, check the constraint in the WHERE clause. > The alternative is to create a temporary table (sandbox >or workspace type thing), copy the data to there along with "status" or >"valdation" columns, run through the set many times over looking for >any rows that may fail, marking them as such, No need to loop over the data many times. In most cases, you only need a single UPDATE with a CASE to check the variuous constraints and set the "status" column accordingly. > and then at the end only >dealing with those rows which "passed" the testing. Of course, in order >for this to work you must know (and duplicate) all constraints so you >know what to look for in your testing. Yes. You need to know your constraints. Isn't that a normal part of your job? -- Hugo Kornelis, SQL Server MVP
Post Follow-up to this messageJayCallas@hotmail.com wrote: > I know this question has been asked. And the usual answer is don't use > cursors or any other looping method. Instead, try to find a solution > that uses set-based queries. > > But this brings up several questions / senarios: > > * I created several stored procedures that take parameters and inserts > the data into the appropriate tables. This was done for easy access/use > from client side apps (i.e. web-based). > > Proper development tactics says to try and do "code reuse". So, if I > already have stored procs that do my logic, should I be writing a > second way of handling the data? If I ever need to change the way the > data is handled, I now have to make the same change in two (or more) > places. > > * Different data from the same row needs to be inserted into multiple > tables. "Common sense" (maybe "gut instinct" is better) says to handle > each row as a "unit". Seems weird to process the entire set for one > table, then to process the entire set AGAIN for another table, and then > YET AGAIN for a third table, and so on. > > * Exception handling. Set based processing means that if one row fails > the entire set fails. Looping through allows you to fail a row but > allow everything else to be processed properly. It also allows you to > gather statistics. (How many failed, how many worked, how many were > skipped, etc.) > > ?? Good idea ?? The alternative is to create a temporary table (sandbox > or workspace type thing), copy the data to there along with "status" or > "valdation" columns, run through the set many times over looking for > any rows that may fail, marking them as such, and then at the end only > dealing with those rows which "passed" the testing. Of course, in order > for this to work you must know (and duplicate) all constraints so you > know what to look for in your testing. Another reason why code re-use is less of an issue in SQL compared to general programming languages is that it's so easy to generate scripts automatically from your database's metadata. That's particularly the case for CRUD scripts, transformation scripts, archiving, auditing, etc. If you have good constraints and naming conventions then you can automate the production of thousands of lines of code in seconds. That maybe not as important as Robert and Hugo's other sensible comments but it is a distinct advantage. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
Post Follow-up to this messageThanks for replying. As you probably guessed, there are specific reasons why I am asking these questions. Maybe a bit more info would be helpful. * I am not talking about the same data being written to multiple tables. I referring to situations where different columns from the same row are written to different tables. A concrete example would be security (stocks) data. There are different types of securities. (Equities and Options are just two of them). Both types of securities share common information like symbol, name, and security type. But options have additional data (strike price, class, and expiration). My current schema has a table named SecurityMaster where common data goes with the primary key being Symbol. I also have a SecurityMasterDeriva tive table where I put Option specific data which has a foreign key reference to the SecurityMaster table. So when I get a new security master file, the set-based approach says to insert all the new common security data into the SecurityMaster table and then go back and and reprocess the new data to insert the option specific data. My dilemma is that I feel it is more "natural" to treat each row as a single entity. * The stored procedure approach I mentioned in the first point is related to the example I just gave. When inserting a new security into the "Security Master database" (I do not mean the single table), there is a at least one table involved but possibly up to 4 tables depending on the type of security that we are dealing with. * As far as exception handling goes, why is it considered "normal" to want all the data or just some of it? What about the idea of getting "most" of the data in and then dealing with those "exceptions"? Is it considered "business logic" to say "I rather have the majority of the data as opposed to NONE of the data"? I understand that I can make one large INSERT query which also performs ALL the required checks in order to filter out "bad" rows. But I would think that the query could get very complex very quickly. I regularly deal with situations where the data can not only NOT be trusted to be complete but where column names and formats are VERY different even for "similar" information. I find that I need to "backfill" a lot of data or perform a good number of value conversions. One example that comes to mind of using a single query resulted in code that took 25+ minutes ro run. In this situation I was dealing with position data for securities. For Equities, I trusted the CUSIP value they gave me but not the symbol. So I decided to "verify" the symbol against what I knew it as (ie. backfill the column). But it was not enough to just do a left join. Because it was possible to have the CUSIP repeated in the lookup table (the PK was CUSIP and Symbol). So my Symbol lookup logic (remember this was a single query), was to see if there was only one row with the given cusip, then check to see if there was only one active security row with the given cusip, THEN check to see if there was only one north american active security with the given cusip, otherwise use what the source gave us. And that was only for equities... For options the logic is 100% opposite... Talk about complex queries that SUCK in performance... The reason why I suggested the multiple pass validation approach was to be able to record or mark not only those rows that failed but WHY they failed. (Assume that there is a [ValidationCode] and [ValidationReas on] column in the "sandbox" table.) I need to be able to create an exception report so somebody can go back and "fix" the data. ----- I guess the bottom line is that I am probably trying to apply my many years of application development approaches and opinions to database development and the two do not mesh.
Post Follow-up to this message(JayCallas@hotmail.com) writes: > * I created several stored procedures that take parameters and inserts > the data into the appropriate tables. This was done for easy access/use > from client side apps (i.e. web-based). > > Proper development tactics says to try and do "code reuse". So, if I > already have stored procs that do my logic, should I be writing a > second way of handling the data? If I ever need to change the way the > data is handled, I now have to make the same change in two (or more) > places. Undeniably, this is surely a common reason why people end up with cursors. They already have a procedure that performs an operation on a scalar set of data, and now they need to do it multiple times. We have plenty of such examples in our database. When you face this situation you have a tradeoff: speed up development and run a cursor and take the performance cost. Or rewrite that stored procedure so that it deals with data in a table. Typically when we do this, we use a temp table for input or more often a process- keyed table. (See http://www.sommarskog.se/share_data.html#usingtable for a discussion on this.) You keep the old scalar procedure, but to not duplicate logic, you make it a wrapper on the set-based procedure. Rewriting a scalar procedure into set-based is not a trivial task, not the least when you code with performance in mind. And not the least that at least during a transitional period there will be loops that call the set-based procedure for one row at a time. There is certainly an overhead of getting data from a table rather than from parameters. So there is definitely a trade-off here. But if a loop today is the best bet, it may not be tomorrow, because as the data grows in size, performance becomes an issue. I should add that I'm here talking of procedure where logic is really complex. As the others have said, code reuse is not equally much a virtue in SQL as it is application code. > * Different data from the same row needs to be inserted into multiple > tables. "Common sense" (maybe "gut instinct" is better) says to handle > each row as a "unit". Seems weird to process the entire set for one > table, then to process the entire set AGAIN for another table, and then > YET AGAIN for a third table, and so on. I'm not sure that I understand this point. As I mentioned, I usually have my input to my set-based procedures in a table, so I don't really see the problem. > * Exception handling. Set based processing means that if one row fails > the entire set fails. Looping through allows you to fail a row but > allow everything else to be processed properly. It also allows you to > gather statistics. (How many failed, how many worked, how many were > skipped, etc.) Yes, this is a point that is often over-looked. Judging from some of the other replies, some people appears to prefer ignore this issue, but depending on your business requirement and the likelyhood for errors, this may be a decisive reason to run a cursor. For instance, I recently rewrote a stored procedure in our system which makes a contract note definitive. A stock broker have thousands of contract notes every day. Typically customer notes stay open all day, but at the end of the day they should all be made definitive. In conjunction with this there are a number of updates to perform. With the old procedure, the notes were handled one by one, and if an occasional note failed, that was no disaster. It could be sorted out the next day. For the new procedure, I do perform some initial validations, and notes that fails these validations will not be updated whereas the rest will be. But from the point that I've started to perform updates, there is not really an easy way out if there is some problem with a single note. That means that the entire operation will fail. In our case, this should only happen exceptionally, but if it happens the customer be in dire straits, not the least that this will typically happen after hours. (Please don't suggest that I should validate everything in advance. We're not only talking constraints, but assertions in sub-procedures, whereof some are very complex.) A possible fall back in this case, that I have implemented, but maybe I should would be that if there is an error, I would then start to take the notes one-by-one. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pr...oads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodin...ions/books.mspx
Post Follow-up to this message(JayCallas@hotmail.com) writes: > So when I get a new security master file, the set-based approach says > to insert all the new common security data into the SecurityMaster > table and then go back and and reprocess the new data to insert the > option specific data. My dilemma is that I feel it is more "natural" to > treat each row as a single entity. That is a feeling that you should overcome. It's perfectly normal to read the same row all over again. (Maybe one day we will get a multi-table INSERT, but I am not holding my breath.) > * As far as exception handling goes, why is it considered "normal" to > want all the data or just some of it? What about the idea of getting > "most" of the data in and then dealing with those "exceptions"? Is it > considered "business logic" to say "I rather have the majority of the > data as opposed to NONE of the data"? Of course, sometimes that is the business rule: all or nothing. But there are also lots of processes where it's OK that some data slips through the crack, as long what is a unit is a unit. (You don't want an option to be imported into SecurityMaster, but then not getting the option-specific data in place. Then you rather lose it.) In the case where you import data, validation errors may be anticipated, and you could try to check for the most likely in advance. In the end what matters is of course, if the performance for the solution you have is acceptable or not. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pr...oads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodin...ions/books.mspx
Post Follow-up to this messageErland Sommarskog wrote: > > I'm not sure that I understand this point. As I mentioned, I usually > have my input to my set-based procedures in a table, so I don't really > see the problem. I think what Jay is up to here is mapping of an OO model to a relational model. There is one "master" table which represents the root class of his hierarchy and several tables for specific types. I've never worked with user defined types but maybe these are an option here as well. Kind regards robert
Post Follow-up to this messageRobert Klemme (bob.news@gmx.net) writes: > Erland Sommarskog wrote: > > I think what Jay is up to here is mapping of an OO model to a relational > model. There is one "master" table which represents the root class of > his hierarchy and several tables for specific types. I've never worked > with user defined types but maybe these are an option here as well. I suppose you are thinking of the CLR UDTs added to SQL 2005. No, they are not an option here. CLR UDTs are for small values that always go together, and where you rarely would search for the individual values. Personally, I see CLR UDTs as a fairly marginal benefit. As for the structure of Jay's tables, I have no problem to understand it, as I work with securities myself, and different classes of securities have different properties. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pr...oads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodin...ions/books.mspx
Post Follow-up to this messageErland Sommarskog wrote: > Robert Klemme (bob.news@gmx.net) writes: > > I suppose you are thinking of the CLR UDTs added to SQL 2005. No, they are > not an option here. CLR UDTs are for small values that always go together, > and where you rarely would search for the individual values. Personally, I > see CLR UDTs as a fairly marginal benefit. Ah, good to know. > As for the structure of Jay's tables, I have no problem to understand it, > as I work with securities myself, and different classes of securities > have different properties. Then I probably misunderstood your posting. Sorry for the noise. Kind regards robert
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread