Monday, January 17, 2011

Using the MERGE statement

I love MERGE.  You know all those awful IF BEGIN END ELSE BEGIN END blocks you need to write for SQL Server?  A well-written MERGE statement can eliminate a lot of them!  Let's do a quick intro.
--Prepare test tables.
      ProductID INT PRIMARY KEY,
      Name VARCHAR(40) NOT NULL,
      Price MONEY NOT NULL,
      [Description] VARCHAR(100) NOT NULL,
      [LastModified] DateTime NOT NULL

      TheProductID INT,
      TheName VARCHAR(40),
      LatestPrice MONEY,
      Descr VARCHAR(100)
--Load sample data
INSERT INTO @MyProducts (ProductID,Name,Price,[Description],[LastModified])
      VALUES (35,'WIDGET TYPE B',99.95,'Blue','12/5/1999'),
                  (44,'WIDGET TYPE R',88.95,'Red','12/8/2010'),
                  (59,'WIDGET TYPE O',112.95,'Orange','12/9/2010'),
                  (77,'WIDGET TYPE G',55.95,'Green','12/15/2010'),
                  (110,'WIDGET TYPE Y',77.95,'Yellow','12/19/2010');
INSERT INTO @ImportList (TheProductID,TheName,LatestPrice,Descr)
      VALUES (44,'Widget Type C',78.95,'Crimson'),
                  (111,'Widget Type P',94.95,'Purple'),
                  (112,'Widget Type M',75.95,'Mauve');

--display existing table
SELECT 'BEFORE' as [Label], * FROM @MyProducts;

--demonstrate the data the merge will use
SELECT 'ExampleFullJoin' as [Label],
      CASE WHEN mp.ProductID IS NULL AND il.TheProductID IS NOT NULL THEN 'Not Matched'
            WHEN mp.ProductID IS NOT NULL AND il.TheProductID IS NULL THEN 'Not Matched By Source'
            WHEN mp.ProductID IS NOT NULL AND il.TheProductID IS NOT NULL THEN 'Matched' END as [MergeAction],
      mp.*, il.* FROM @MyProducts mp
            FULL JOIN @ImportList il on mp.ProductID = il.TheProductID;

--do the merge
MERGE INTO @MyProducts as mp
      USING (SELECT TheProductID,UPPER(TheName) as [Name], LatestPrice,Descr FROM @ImportList) AS stg
            ON mp.ProductID = stg.TheProductID
            UPDATE SET
                  mp.Name = stg.Name,
                  mp.Price = stg.LatestPrice,
                  mp.[Description] = stg.Descr,
                  mp.LastModified = CURRENT_TIMESTAMP
            INSERT (ProductID,Name,Price,[Description],[LastModified])
            VALUES (stg.TheProductID, stg.Name, stg.LatestPrice, stg.Descr, CURRENT_TIMESTAMP)
--display table post-merge
SELECT 'AFTER' as [Label], * FROM @MyProducts;

Structure of the MERGE statement:
  1. The first part of the merge statement defines where you're intending to put the data.  I always use an alias for the destination to keep things unambiguous.  (@MyProducts is aliased above as "mp").
  2. The USING clause defines what you're going to use as the source for the new data.  I have a table called @ImportList that might match a CSV file you had just imported or something like that.  I'm doing SELECT here with an upper case function on my "TheName" field and aliasing it as [Name].  This is to demonstrate that I can do any normal SELECT query here that I wish including ones that join to other tables, call functions, etc.
  3. The "AS stg" clause is also an alias.  I like to alias my "USING" table as stg or staging.  It helps keep my brain straight that I'm going to be taking data from a "staging" table and putting it in my main table in some intelligent way.  Also neither "staging" or "stg" are reserved words in T-SQL.  The official T-SQL term for this table is SOURCE, which we will see later.
  4. The ON is where we do a join between the MERGE INTO table and the USING table - in this case it's where @MyProducts gets merged to what I'm calling "stg".  What I'm calling stg is similar to @ImportList, but not identical - note that I have changed the case and renamed a particular field.  Any standard operation you can do in a SELECT statement is valid and will be used as the dataset that gets merged.
  5. What type of join gets done?  Well, it's pretty much like a FULL JOIN (see "ExampleFullJoin" when you run the code).  We then are allowed to proceed to instruct SQL Server what to do based on what it finds.  This part operates much like a CASE block.
  6. The first "WHEN" clause uses the SQL keyword MATCHED which indicates that the fields in both tables for  the given row satisfy the join.  These are the rows that would come up if you did a SELECT and an INNER JOIN between the tables using the fields you specified in the ON clause.  Generally when the keys in the staging table and the main table match, you will do some sort of UPDATE to the main table, which I have done in the example.  You can do any sort of normal formatting that is possible, CASE blocks, call scalar functions, etc. - it's just like any update statement.  In the example, you can see how Product ID 44 is updated from Type R to Type C and from Red to Crimson.
  7. The second "WHEN" clause in my example says what to do when a row exists in the staging table (SOURCE) that doesn't exist in the main table.  In this example, we are choosing to do an INSERT for these rows.  (Data not in the main table, so we put it in there.)  The INSERT statement is pretty typical of the VALUES variety.  In the example, the purple and mauve rows get inserted.
  8. In our final "WHEN" clause, we tell SQL server what to do when there is a row in our main table that doesn't match a row in the staging table.  This is where the "BY SOURCE" clause on the MATCHED keyword comes in.  In our example I'm demonstrating an important thing that I don't think gets enough press in other MERGE examples on the Internet - the MATCHED logic is just essentially a boolean - and we can put additional clauses to test on the WHEN statement just like any other CASE block.  In the example, I'm saying "If a product is in the main table and not in the staging table, AND if that product hasn't been modified in more than ~10 years, do something".  I could have added similar additional logic to the above WHEN statements that did the update or insert as well if I desired different types of update or insert logic, which just would have required additional WHEN MATCHED AND statements.  If I did have these, each would be tested in sequence for each row and the first one that matches is executed.
  9. The last line of the merge instructs SQL Server to DELETE the row without a match in the staging table.  Be very careful with this clause and technique.  If you don't put additional qualifiers on the NOT MATCHED BY SOURCE, you will delete all of the data in your main table that doesn't match.
  10. Lastly, MERGE statements must end with a Semicolon, so that's on there.  Technically, all T-SQL statements could, but I would be surprised if MS ever enforced this.  Since MERGE is new functionality, they probably felt it was safe to start enforcing it for this statement only.

No comments:

Post a Comment