Tuesday, May 15, 2012

Another Way To Insert Many Rows Very Fast From Your .NET Application

There was a great deal of feedback to my previous post One Way To Insert Many Rows Very Fast From Your .NET Application.

The post focused on the performance benefit of using a single SQL statement with an XML-based parameter to do inserts of many rows to a table versus issuing an insert "per row".

Several commenters on the original post mentioned that with SQL Server 2008 and higher, it is possible to pass a Table-Valued Parameter (TVP) to SQL Server and that this would save SQL Server the overhead of parsing the XML before saving the data.  I had used TVPs before inside databases, but never as a parameter to an externally-facing DB object.  Here's how I added a third insert method to my test application:

Creating the Type

When creating a table-valued type in your database, you have two main options: specific or generic.  The specific case would be to create a type called something like ProductIDList and the generic case would be to create a type called something like UniqueIntegerList.  I like using generic Table-valued types when I can, because it helps to avoid "bunny types" - basically needing to have many duplicate types that seem to multiply on their own.  Your mileage may vary here - I find that this is easier for me than maintaining "EmployeeIDList", "ProductIDList", "WidgetIDList", and the like, when these might be all the same definition.

So I created this "generic" table-valued type:

CREATE TYPE dbo.UniqueIntegerList AS TABLE

(

       TheInteger INT NOT NULL

    PRIMARY KEY (TheInteger)

);

Creating the Save Stored Procedure

Next, I created a new stored procedure which would accept my new Table-Valued Type as a parameter.

CREATE PROC DoTableValuedParameterInsert(@ProductIDs dbo.UniqueIntegerList READONLY)

AS BEGIN

      

       INSERT INTO ProductsAccess(ProductID)

       SELECT TheInteger AS [ProductID]

       FROM @ProductIDs;

 

END

 

In this procedure, I am passing in a parameter called @ProductIDs.  This is of type "dbo.UniqueIntegerList" which I just created in the previous step.  SQL Server looks at this and says "oh I know what this is - this type is actually a table". Since it knows that the UniqueIntegerList type is a table, I can select from it just like I could select from any other table-valued variable.  You have to mark the parameter as READONLY because SQL 2008 doesn't support updating and returning a passed table-valued parameter.

Creating the Save Routine

Then I had to create a new save routine on my business object that would call the new stored procedure.  The way you prepare the Table-Valued parameter is to create a DataTable object with the same column signature as the Table-Valued type, populate it, and then pass it inside a SqlParameter object as SqlDbType.Structured.

Public Sub SaveViaTableValuedParameter()

 

  'Prepare the Table-valued Parameter

  Dim objUniqueIntegerList As New DataTable

  Dim objColumn As DataColumn = objUniqueIntegerList.Columns.Add("TheInteger", _

             System.Type.GetType("System.Int32"))

  objColumn.Unique = True

 

  'Populate the Table-valued Parameter with the data to save

  For Each Item As Product In Me.Values

    objUniqueIntegerList.Rows.Add(Item.ProductID)

  Next

 

  'Connect to the DB and save it.

  Using objConn As New SqlConnection(DBConnectionString())

    objConn.Open()

    Using objCmd As New SqlCommand("dbo.DoTableValuedParameterInsert")

      objCmd.CommandType = CommandType.StoredProcedure

      objCmd.Connection = objConn

      objCmd.Parameters.Add("ProductIDs", SqlDbType.Structured)

 

      objCmd.Parameters(0).Value = objUniqueIntegerList

 

      objCmd.ExecuteNonQuery()

    End Using

    objConn.Close()

  End Using

End Sub

 

So was it faster using a Table-Valued Parameter?

My old test harness used runs of up to 504 rows in up to 50 test runs.  When doing a small number of rows and 50 trials, using the XML-based parameter versus the Table-valued parameter didn't seem to make a detectable or significant difference.   The two methods kept trading places and the difference was indistinguishable from noise.

 

When I upped the test runs to 1,000 batches at a time, the pattern became more clear.

 

image

Lower is faster.  Test runs 1-3 done with the XML insert first; tests 4-6 done with the TVP insert first.  Testing was done with SQL Server 2008 Developer Edition SP2 (10.0.4311.0).

So the answer is that the Table-valued parameter is clearly faster when you start getting into several hundred inserts in one call, but using a TVP does not appear to provide an advantage to using an XML parameter below 100 rows, at least with the way that I've implemented this.  I would hesitate to say that the XML is "faster" below 100 rows to insert - the difference is so slight (under 2 ms per call) that it could be due to the implementation details in my test program and not SQL Server-related.

Pros and Cons

One of the advantages of using a TVP is that you can create a primary key and zero or more other unique indexes on it - this might help with performance when implementing "Upsert" or MERGE patterns.  XML data doesn't come in with indexes applied, so it could definitely be slower for scenarios that involve a join (not tested in this blog post).  It seems that if the count of rows you need to insert per call is fairly large and performance is your only consideration, using a TVP could be a good choice.

In my opinion, using XML is more flexible than using a TVP approach because you aren't required to maintain types inside the database along with the stored procedures and other code.  Using an XML parameter also requires somewhat less code in the persistence layer to implement (and practically none if the XML serialization logic lives in the business layer).  Lastly, XML serialization code is reusable in other scenarios (serializing an object as XML to disk, to a log, or in unit tests, for example), whereas the persistence code for a TVP is only useful for passing whole objects to SQL Server.

Wrap-up

Even in the largest scenario tested, the insert work averaged 24ms to insert 500 rows of data via the XML parameter versus 14.2ms for the TVP.  This type of difference isn't perceptible to a user, though clearly it could add up for a large application with many users calling this code frequently.  However, either approach compares very favorably to the average 719ms to complete a 500 row insert measured in my previous post using the non-optimized 1x1 insert method implemented by many ORMs.

As always, the best choice depends on the requirements for your application.  Either of the two methods tested XML or a TVP are strong candidates for inserting many rows very fast from your .NET application, and each holds some advantages over the other.

I greatly enjoyed hearing the feedback to the prior blog post; I'd love to hear your thoughts on this additional article!

Friday, May 11, 2012

One Way To Insert Many Rows Very Fast From Your .NET Application

This is my attempt to explain something that I have wanted to figure out for a long time.  I have read many articles on this very subject but they always seem to fall short for various reasons - generally by including way more detail or complexity than is necessary to answer the question or get someone going.  Due to the work of others who have written similar articles on this subject in the past (particularly MVP Michael Coles), I have eventually figured this out.  This is my attempt to present this material in an end-to-end yet bare-bones and hopefully easily-understandable format for any .NET or SQL developer who needs to do this.

Note: I've written a follow-up to this article based on the excellent feedback: Another Way To Insert Many Rows Very Fast From Your .NET Application

The issue:

Recently, I was asked to troubleshoot the startup performance of a VB.NET application with a SQL 2008 back-end.  The startup routine was taking up to a minute or so to complete in locations with a fast DB connection, and up to 15 minutes in locations with a slow DB connection.

By stepping through the application and inserting Debug.Print() statements liberally, I narrowed-down the "slow" part of the routine to a loop that was inserting product IDs into a table.  Basically, the way the app worked was that when it was launched, it would figure out which products were relevant to the active user and then it would insert those into a table that was used with INNER JOINs in all the rest of the screens and reports.

There was a table in the SQL database that looked a bit like this (pretend there is only ever one user for this example):

CREATE TABLE ProductsAccess(

ProductID INT NOT NULL

PRIMARY KEY (ProductID));

 

The VB.NET code that initialized the product list for the user in the DB was something like this (using typical ORM stuff):

 

'Delete all records in the table

ProductsAccess.DeleteAll() 

'now populate all products that the user can choose in a loop

For i = 0 To objProductList.GetUpperBound(0)

       ProductsAccess.Insert(objProductList.Item(i).ProductID)

Next

 

Behind the scenes, this meant at least 1 call to the database per row to be inserted, plus the first call to clear out any existing rows.  Since it was typical for our users to have several thousand products available to them, this could take a very long time, particularly with remote users on high-latency connections.

 

It wasn't the amount of data that was the problem - it was the count of the calls that was the issue.  Instead of saying "Insert product 1", "Insert Product 2", "Insert Product 3", etc., I needed to say "Insert Products: 1, 2, 3," etc.

 

What I did:

Here is the simple version of how I changed the insert routine to cut the number of database calls to insert "n" rows from "n+1" to "always 2":

 

I created this stored procedure:

CREATE PROC ProductsAccessXMLInsert(@ValidXMLInput XML)

AS BEGIN

 

       INSERT INTO ProductsAccess (ProductID)

       SELECT Col.value('@ID','INT')

       FROM @ValidXMLInput.nodes('//Products/Product') Tab(Col)

 

END

 

Then I modified my .NET code to look something like this (note that I left the DeleteAll method untouched):

 

'Delete all records in the table

ProductsAccess.DeleteAll()

 

'Build a valid XML string (you should use System.Xml in real code - this is just a demo)

Dim strBuffer As New Text.StringBuilder

strBuffer.Append("<Products>")

For i = 0 To objProductList.GetUpperBound(0)

    strBuffer.Append(" <Product ID=""" & objProductList.Item(i).ProductID.ToString & """ />")

Next

strBuffer.Append("</Products>")

 

'now populate all products that the user can choose by passing a single XML document

ProductsAccess.InsertXML(strBuffer.ToString)

 

How it works:

The XML generated in the loop would look something like this:

<Products>

<Product ID="1" />

<Product ID="2" />

<Product ID="3" />

</Products>

 

The new InsertXML() method called the new ProductsAccessXMLInsert stored procedure, passing all of the generated XML in only a single call to the DB.

 

Public Sub InsertXML(TheXML As String)

    Using objConn As New SqlConnection(DBConnectionString())

        objConn.Open()

        Using objCmd As New SqlCommand("dbo.ProductsAccessXMLInsert")

            objCmd.CommandType = CommandType.StoredProcedure

            objCmd.Connection = objConn

            objCmd.Parameters.Add("ValidXMLInput", SqlDbType.Xml)

            objCmd.Parameters(0).Value = TheXML

 

            objCmd.ExecuteNonQuery()

        End Using

        objConn.Close()

    End Using

End Sub

 

Inside the ProductsAccessXMLInsert stored procedure, SQL Server does a SELECT from the input XML parameter's built-in "nodes" table-valued function using the passed XPath query ("//Products/Product").  The XML "columns" are then accessible using the Col alias (this comes from the Tab(Col) tag after the XPath query).  In the SELECT clause, I referenced the ID attribute of the XML nodes, and instructed SQL Server to cast it as an Integer (these are the two parameters passed to the Col.value(,) function in the SELECT clause).  XML attribute names must be prefixed with an @ symbol.  (For node names you can leave the @ off.)  Then I'm simply doing an insert to the ProductsAccess table from the results of my SELECT.

 

Performance Difference:

I did four sets of tests using insert sets of 10 rows, 50 rows, 100 rows, and 504 rows (the number of products in the AdventureWorks Production.Product table which was the test data source).  I did three sets of tests for each count of rows with doing the one-by-one inserts first, and then three series of tests doing the XML inserts first.  All runs were measured using the .NET Stopwatch class.

 

So how much better was this single call using XML versus the using the standard row-by-row INSERT call?  Here are the results:

 

(lower milliseconds to complete is better)

 

image

 

Conclusion:

The XML insert method for our scenario of needing to insert many rows into our SQL database in one shot averaged more than 97% faster than looped inserts when processing 504 rows of data, and was still more than 81% faster with a mere 10 rows of data to insert.  This was averaged over 6 runs of 50 tries of each method and by my eye, the results were fairly consistent.

 

I hope you found this a useful introduction on one way to efficiently handle multi-insert scenarios with SQL Server using an XML parameter.  I would love to hear any comments or questions you might have.

 

Notes and Disclaimers:

  1. This is merely one way to insert many rows very fast from your .NET application.
  2. The term "many" may mean different things for you or your application.
  3. To keep the performance testing fair, I first refactored the existing insert method to use a stored procedure that took the product ID as a parameter so it wasn't using wonky ORM-generated code.:
    CREATE PROC DoSingleInsert(@ProductID INT)

    AS BEGIN

           INSERT INTO ProductsAccess (ProductID) VALUES (@ProductID);

    END
    Also the inserts were processed in a tight loop using a single SqlCommand object with a strongly-typed parameter and just updating the parameter value and calling ExecuteNonQuery().  This is about as efficient as ADO.NET typically gets, though I would love to hear if there is a way to drive a SqlCommand object harder.

  4. Testing was done with the application and SQL Server running off two ports of the same gigabit switch to minimize the factor of network latency.

  5. To keep this post as simple as possible, I am pretending that there is only ever one user for this application - my real table also has a UserID field, among others.  Additionally, in my real solution I do a MERGE against my XML parameter which allowed me to also eliminate the DELETE call - perhaps this can be a future blog post.

  6. Dear fellow developer: please don't do what I did above and use a StringBuilder to generate XML.  The moment you have a funny character such as <, >, or &, you will be in a world of pain.  You should use the features of the System.XML namespace to generate XML, however that was more complicated than I wanted to get into for this blog post.

  7. The slow part of the original code in this application was basically the last four lines of code in a massive subroutine.  I had seen dozens of things earlier that I thought might be the culprit, but actually were OK.  In doing this work, I was reminded of the lesson to not do premature optimization of code; I could have spent an hour or two making all the "kind-of fast" code "super fast" and the end result would have been completely unnoticeable due to the four lines of boat-anchor DB access code which were the main problem.

See Also:

Creating a deep hierarchy with FOR XML

Using the MERGE Statement

Thursday, May 10, 2012

One Way to Work Around the .NET Setup Project’s Picky Version Number Behavior

If you don’t change the version number of your .NET project output every time you create a setup package, you may have found that it’s difficult to have the Setup program reliably uninstall the existing version of your software before installing your update.

Here’s how to add a batch file to your setup project to accommodate this.

First, left-click to select your Setup project in the Solution Explorer, and copy the “ProductCode” field’s value (it will be a GUID) from the properties window (if you don’t see it, hit F4 to bring up the properties listing).

image

Next, right-click your Setup project and choose Open Folder in Windows Explorer.

Create a file there called RunMeToSetup.BAT with the following content (substituting the GUID below with the one you copied):

MsiExec.exe /uninstall {00000000-0000-0000-0000-000000000000} /qb!-
"%~dp0setup.exe"

Save the batch file.  Then go back into Visual Studio.

If your Setup project is under source control you may have to right-click it and do “Check Out for Edit…” prior to completing the next step.  (If it is not, dude WTF?)

Right click your Setup project and choose Add… File… and add the .BAT file.

Then select the .BAT file  in the list and hit F4 to show the properties window for it.  You need to change the “PackageAs” property to “vsdpaLoose” which is the non-intuitive way of saying that this should be copied into the installer directory, but not zipped up into the MSI.  (Stay classy, MSFT!)

Then when you right-click and build your Setup project, the output will be the setup.exe, the MSI file, and your new batch file.  When you double-click the new batch file, it will run a “bare” uninstaller first for your application’s GUID (non-interactive UI), and then run the setup program.  The crazy %~dp0 thing is just to work around the command-line not fully understanding UNC paths as a starting point (if you double-click the batch file starting in \\myserver\myshare, the “%~dp0setup.exe” will resolve to “\\myserver\myshare\setup.exe”).

If you wish to customize the uninstaller behavior, you can find the switches that MSIEXEC accepts here:  http://msdn.microsoft.com/en-us/library/windows/desktop/aa367988(v=vs.85).aspx

Once you test this successfully on a VM or however else you test your install software, check your setup program and the new batch file back into source control.