Speeding up multiple updates
|
|
Thread rating:  |
brian.lukoff@gmail.com - 10 Oct 2006 15:15 GMT I have a table with three columns, ID, Field, and Data. There are about 1000 rows with a given ID (and given an ID each row has a unique Field value). At some point in a script, I need to update the Data column of a subset of these rows (somewhere between 1 and maybe 30 of them), but each updated Data value needs to be able to contain a different value. Right now I'm using a sequence of UPDATE statements, one for each row that I need to update, (e.g., UPDATE [...] SET [Data] = ... WHERE [Field] = ... AND [ID] = ...) but this is very slow and is causing users to receive timeouts and SQL deadlock errors. Is there a better way for me to do this?
Brian
Anthony Jones - 10 Oct 2006 15:39 GMT > I have a table with three columns, ID, Field, and Data. There are > about 1000 rows with a given ID (and given an ID each row has a unique [quoted text clipped - 8 lines] > > Brian You didn't specify what DB. How are you executing these updates? Are they in a transaction?
I can't see updating 30 records in a 1000 record DB taking any measurable time at all. (Unless your Data field just happens to be a Large Binary type field in which you are storing a lot of data).
Do you have triggers on the table that might be taking a long time? Is the table actively being read as part of other transaction protected operations?
brian - 10 Oct 2006 18:04 GMT Hi Anthony,
The table has many records, but for any particular ID there are about 1000 records each. There are probably about 300 IDs, so about 300,000 records in the database.
The UPDATEs are being sent through the Execute statement of a connection object. There aren't any triggers. The Data field is a large varchar column (size 5000, I think).
Brian
> > I have a table with three columns, ID, Field, and Data. There are > > about 1000 rows with a given ID (and given an ID each row has a unique [quoted text clipped - 19 lines] > Is the table actively being read as part of other transaction protected > operations? Bob Barrows [MVP] - 10 Oct 2006 18:45 GMT > I have a table with three columns, ID, Field, and Data. There are > about 1000 rows with a given ID (and given an ID each row has a unique [quoted text clipped - 6 lines] > causing users to receive timeouts and SQL deadlock errors. Is there a > better way for me to do this? By "SQL", I'm assuming you mean SQL Server ... Check out Erland's article about handling arrays in T-SQL*: http://www.sommarskog.se/arrays-in-sql.html
You may be able to improve performance by passing all the updates as a delimited string to a stored procedure which will parse the string, inserting the data into a temp table and then using a join to perform the actual updates.
Bob Barrows * make sure you check out the other articles on his site. I have learned from all of them.
 Signature Microsoft MVP -- ASP/ASP.NET Please reply to the newsgroup. The email account listed in my From header is my spam trap, so I don't check it very often. You will get a quicker response by posting to the newsgroup.
brian - 10 Oct 2006 18:56 GMT Would it be better to redesign the database table so that each user has only one row and the Data column is just a comma-delimited version of *all* of their data: e.g.,
4,5,2,a text response,another text response,2,3
and then just parse into and out of this format in code? The Data column would have to be a text column then, because it would end up being very long. But it would mean that there would only be one row per user (and one UPDATE statement) instead of many.
Brian
> > I have a table with three columns, ID, Field, and Data. There are > > about 1000 rows with a given ID (and given an ID each row has a unique [quoted text clipped - 24 lines] > header is my spam trap, so I don't check it very often. You will get a > quicker response by posting to the newsgroup. Bob Barrows [MVP] - 10 Oct 2006 19:05 GMT > Would it be better to redesign the database table so that each user > has only one row and the Data column is just a comma-delimited > version of *all* of their data: e.g., Absolutely not. It's not the worst thing you could do, but it's close. Almost worse than creating multiple columns to hold the individual data values.
> 4,5,2,a text response,another text response,2,3 > > and then just parse into and out of this format in code? The Data > column would have to be a text column then, because it would end up > being very long. But it would mean that there would only be one row > per user (and one UPDATE statement) instead of many. My solution involves a single UPDATE statement as well. And data integrity is not compromised.
 Signature Microsoft MVP -- ASP/ASP.NET Please reply to the newsgroup. The email account listed in my From header is my spam trap, so I don't check it very often. You will get a quicker response by posting to the newsgroup.
brian - 10 Oct 2006 20:24 GMT > Absolutely not. It's not the worst thing you could do, but it's close. > Almost worse than creating multiple columns to hold the individual data > values.
> My solution involves a single UPDATE statement as well. And data > integrity is not compromised. Why is this so bad? There is no validation done on the values (the user enters the data), so I don't think data integrity would be a problem. (Unless I'm not understanding what you mean.) Also, having one column with delimited values seems simpler and would seem to take less time on the database end since the ASP code would put together (and take apart) the delimited list of values. It's not an elegant solution, but it seems like it should work and alleviate the problems I have been experiencing with timeouts, locking, and cascading blocking.
Brian
Bob Barrows [MVP] - 10 Oct 2006 20:35 GMT >> Absolutely not. It's not the worst thing you could do, but it's >> close. Almost worse than creating multiple columns to hold the [quoted text clipped - 4 lines] > > Why is this so bad? I'm not going to attempt to answer this here because: 1. A newsgroup is not the proper forum to teach relational database design. 2. I am not an expert - I just know enough to realize what a bad idea it is to put delimited data values in a database table.
I suggest you get a book or do a google search.
It may seem like a good solution for your immediate problem, but I can almost guarantee it will lead to problems down the road.
Again, my proposed solution involved a single update to your table and avoided the problems inherent in storing delimited data.
 Signature Microsoft MVP -- ASP/ASP.NET Please reply to the newsgroup. The email account listed in my From header is my spam trap, so I don't check it very often. You will get a quicker response by posting to the newsgroup.
brian - 10 Oct 2006 20:56 GMT OK, I agree that what I'm proposing is a kludgy solution.
I recently found this (suggesting using the CASE statement to make a single UPDATE clause)...it sounds like it might help to speed up the UPDATEs while maintaining the proper database structure: http://www.sql-server-performance.com/nb_case_statement.asp
Do you think this will work to improve the performance of the script to the point of being usable (and remove the problems with timeouts, locking, cascading blocking)? My understanding is that 30 UPDATE statements will require 30 passes through the table (which takes forever with hundreds of thousands or millions of records) while using the CASE statement will only require 1 pass through the table.
Brian
> >> Absolutely not. It's not the worst thing you could do, but it's > >> close. Almost worse than creating multiple columns to hold the [quoted text clipped - 24 lines] > header is my spam trap, so I don't check it very often. You will get a > quicker response by posting to the newsgroup. Bob Barrows [MVP] - 10 Oct 2006 23:25 GMT > OK, I agree that what I'm proposing is a kludgy solution. > [quoted text clipped - 9 lines] > forever with hundreds of thousands or millions of records) while using > the CASE statement will only require 1 pass through the table. That might help ... and it might not. There are too many factors involved about which we know nothing: indexes, extent of fragmentation, etc. The only way to answer the question is to try both solutions and see which one.
My issue with this solution is that it cannot be done without dynamic sql. I would personally prefer a solution that does not involve concatenating a potentially large sql statement, especially since this will leave you vulnerable to sql injection, no matter how stringently you validate the user inputs.
 Signature Microsoft MVP - ASP/ASP.NET Please reply to the newsgroup. This email account is my spam trap so I don't check it very often. If you must reply off-line, then remove the "NO SPAM"
Mike Brind - 10 Oct 2006 20:55 GMT >> Absolutely not. It's not the worst thing you could do, but it's close. >> Almost worse than creating multiple columns to hold the individual data [quoted text clipped - 13 lines] > > Brian Here's one reason: If at some stage in the future you need to do something with these values, and the number of values changes over time (more questions get added), you will spend huge amounts of time producing server-side code to iterate through variable length delimited lists to get to the value that you want - if it's there.
Here's another: If the requirement changes such that a stakeholder in the project wants to know how many people answered "2" to question 3, how will you go about getting a count easily?
-- Mike Brind
|
|
|