Tuesday, May 15, 2012

Another Way To Insert Many Rows Very Fast From Your .NET Application

There was a great deal of feedback to my previous post One Way To Insert Many Rows Very Fast From Your .NET Application.

The post focused on the performance benefit of using a single SQL statement with an XML-based parameter to do inserts of many rows to a table versus issuing an insert "per row".

Several commenters on the original post mentioned that with SQL Server 2008 and higher, it is possible to pass a Table-Valued Parameter (TVP) to SQL Server and that this would save SQL Server the overhead of parsing the XML before saving the data.  I had used TVPs before inside databases, but never as a parameter to an externally-facing DB object.  Here's how I added a third insert method to my test application:

Creating the Type

When creating a table-valued type in your database, you have two main options: specific or generic.  The specific case would be to create a type called something like ProductIDList and the generic case would be to create a type called something like UniqueIntegerList.  I like using generic Table-valued types when I can, because it helps to avoid "bunny types" - basically needing to have many duplicate types that seem to multiply on their own.  Your mileage may vary here - I find that this is easier for me than maintaining "EmployeeIDList", "ProductIDList", "WidgetIDList", and the like, when these might be all the same definition.

So I created this "generic" table-valued type:

CREATE TYPE dbo.UniqueIntegerList AS TABLE

(

       TheInteger INT NOT NULL

    PRIMARY KEY (TheInteger)

);

Creating the Save Stored Procedure

Next, I created a new stored procedure which would accept my new Table-Valued Type as a parameter.

CREATE PROC DoTableValuedParameterInsert(@ProductIDs dbo.UniqueIntegerList READONLY)

AS BEGIN

      

       INSERT INTO ProductsAccess(ProductID)

       SELECT TheInteger AS [ProductID]

       FROM @ProductIDs;

 

END

 

In this procedure, I am passing in a parameter called @ProductIDs.  This is of type "dbo.UniqueIntegerList" which I just created in the previous step.  SQL Server looks at this and says "oh I know what this is - this type is actually a table". Since it knows that the UniqueIntegerList type is a table, I can select from it just like I could select from any other table-valued variable.  You have to mark the parameter as READONLY because SQL 2008 doesn't support updating and returning a passed table-valued parameter.

Creating the Save Routine

Then I had to create a new save routine on my business object that would call the new stored procedure.  The way you prepare the Table-Valued parameter is to create a DataTable object with the same column signature as the Table-Valued type, populate it, and then pass it inside a SqlParameter object as SqlDbType.Structured.

Public Sub SaveViaTableValuedParameter()

 

  'Prepare the Table-valued Parameter

  Dim objUniqueIntegerList As New DataTable

  Dim objColumn As DataColumn = objUniqueIntegerList.Columns.Add("TheInteger", _

             System.Type.GetType("System.Int32"))

  objColumn.Unique = True

 

  'Populate the Table-valued Parameter with the data to save

  For Each Item As Product In Me.Values

    objUniqueIntegerList.Rows.Add(Item.ProductID)

  Next

 

  'Connect to the DB and save it.

  Using objConn As New SqlConnection(DBConnectionString())

    objConn.Open()

    Using objCmd As New SqlCommand("dbo.DoTableValuedParameterInsert")

      objCmd.CommandType = CommandType.StoredProcedure

      objCmd.Connection = objConn

      objCmd.Parameters.Add("ProductIDs", SqlDbType.Structured)

 

      objCmd.Parameters(0).Value = objUniqueIntegerList

 

      objCmd.ExecuteNonQuery()

    End Using

    objConn.Close()

  End Using

End Sub

 

So was it faster using a Table-Valued Parameter?

My old test harness used runs of up to 504 rows in up to 50 test runs.  When doing a small number of rows and 50 trials, using the XML-based parameter versus the Table-valued parameter didn't seem to make a detectable or significant difference.   The two methods kept trading places and the difference was indistinguishable from noise.

 

When I upped the test runs to 1,000 batches at a time, the pattern became more clear.

 

image

Lower is faster.  Test runs 1-3 done with the XML insert first; tests 4-6 done with the TVP insert first.  Testing was done with SQL Server 2008 Developer Edition SP2 (10.0.4311.0).

So the answer is that the Table-valued parameter is clearly faster when you start getting into several hundred inserts in one call, but using a TVP does not appear to provide an advantage to using an XML parameter below 100 rows, at least with the way that I've implemented this.  I would hesitate to say that the XML is "faster" below 100 rows to insert - the difference is so slight (under 2 ms per call) that it could be due to the implementation details in my test program and not SQL Server-related.

Pros and Cons

One of the advantages of using a TVP is that you can create a primary key and zero or more other unique indexes on it - this might help with performance when implementing "Upsert" or MERGE patterns.  XML data doesn't come in with indexes applied, so it could definitely be slower for scenarios that involve a join (not tested in this blog post).  It seems that if the count of rows you need to insert per call is fairly large and performance is your only consideration, using a TVP could be a good choice.

In my opinion, using XML is more flexible than using a TVP approach because you aren't required to maintain types inside the database along with the stored procedures and other code.  Using an XML parameter also requires somewhat less code in the persistence layer to implement (and practically none if the XML serialization logic lives in the business layer).  Lastly, XML serialization code is reusable in other scenarios (serializing an object as XML to disk, to a log, or in unit tests, for example), whereas the persistence code for a TVP is only useful for passing whole objects to SQL Server.

Wrap-up

Even in the largest scenario tested, the insert work averaged 24ms to insert 500 rows of data via the XML parameter versus 14.2ms for the TVP.  This type of difference isn't perceptible to a user, though clearly it could add up for a large application with many users calling this code frequently.  However, either approach compares very favorably to the average 719ms to complete a 500 row insert measured in my previous post using the non-optimized 1x1 insert method implemented by many ORMs.

As always, the best choice depends on the requirements for your application.  Either of the two methods tested XML or a TVP are strong candidates for inserting many rows very fast from your .NET application, and each holds some advantages over the other.

I greatly enjoyed hearing the feedback to the prior blog post; I'd love to hear your thoughts on this additional article!

3 comments:

  1. I can show you how I made a single xml proc for all types. It's completely generic :-)
    Is there a way to have that with TVPs?

    ReplyDelete
    Replies
    1. I've had pretty good luck with hybrid TVPs that define the table type like this:

      CREATE TYPE dbo.IntWithPayload AS TABLE
      (
      ID INT NOT NULL,
      Payload XML NULL,
      PRIMARY KEY (ID)
      );

      The nice thing is that this allows a flexible payload while still allowing SQL Server to do fast joins since the ID column must be unique and already sorted. Plus, this doesn't require SQL Server to shred an XML document representing your entire data structure in one go - it can shred each individual row separately as it's about to do the insert/update/delete operation. This seems to be a lot less stressful for SQL Server versus having it shred a potentially very large document as you go beyond a few hundred rows. This is probably worth another blog post.

      Delete