Thursday, January 27, 2011

Creating a deep hierarchy with FOR XML

I have two tables in an application that I support.  One is called [Report], and the other is [ReportParameter]. 

      --SAMPLE DATA
      DECLARE @Report TABLE (
            ReportID INT PRIMARY KEY,
            Title VARCHAR(30),
            TitleHelp VARCHAR(100),
            ReportName VARCHAR(50),
            ShowSearchScreen BIT,
            AdminsOnly BIT
      )
      DECLARE @ReportParameter TABLE (
            ReportID INT NOT NULL,
            ParameterName VARCHAR(30) NOT NULL,
            [Type] VARCHAR(15) NOT NULL,
            PRIMARY KEY (ReportID, ParameterName)
      )
     
      INSERT INTO @Report VALUES
            (1,'Report #1','First report...','Report1.rpt',1,0),
            (2,'Report #2','Second report (no selection screen)','Report2.rpt',0,1)

      INSERT INTO @ReportParameter VALUES
            (1,'@StartDate','datetime'),
            (1,'@EndDate','datetime'),
            (2,'@ProductCode','string')

      SELECT * FROM @Report
      SELECT * FROM @ReportParameter
My goal is to convert these two tables into an XML file.  There are a few tricky things about the destination schema:
  1. Specific Name for the root element and all child elements.
  2. Mixed levels of hierarchies.
  3. Some denormalization of the data is required because the new schema uses a “tag” format rather than allowing some custom fields like my “ShowSearchScreen” field.
The first part is pretty easy.  I can write this query to get the below formatted XML output.
SELECT ReportID as [@ID], Title, TitleHelp as [Description],
            ReportName as [FileName]
      FROM @Report r
            FOR XML PATH('Report'), ROOT('Reports');
Results in:
<Reports>
  <Report ID="1">
    <Title>Report #1</Title>
    <Description>First report...</Description>
    <FileName>Report1.rpt</FileName>
  </Report>
  <Report ID="2">
    <Title>Report #2</Title>
    <Description>Second report (no selection screen)...</Description>
    <FileName>Report2.rpt</FileName>
  </Report>
</Reports>
Note the FOR XML PATH(‘Report’) which identifies what I want each data node to be called, and the ROOT(‘Reports’) option which lets me name the root node.  So far so good.  Now for adding my parameters.  I need to create the Parameters node by doing a subselect, and I can get a column to show as an attribute by prefixing the name with an @ sign:
      SELECT ReportID as [@ID], Title, TitleHelp as [Description],
            ReportName as [FileName],
            (
                  SELECT ParameterName as [@Name],
                        [Type] as [@Type]
                        FROM @ReportParameter rp
                        WHERE rp.ReportID = r.ReportID
                        FOR XML PATH('Parameter'), TYPE
             ) as [Parameters]
      FROM @Report r
            FOR XML PATH('Report'), ROOT('Reports');
This results in the following XML:
<Reports>
  <Report ID="1">
    <Title>Report #1</Title>
    <Description>First report...</Description>
    <FileName>Report1.rpt</FileName>
    <Parameters>
      <Parameter Name="@EndDate" Type="datetime" />
      <Parameter Name="@StartDate" Type="datetime" />
    </Parameters>
  </Report>
  ...
</Reports>

The “TYPE” keyword tells the main query that the child query will be passing up XML nodes and that it shouldn’t try to escape it as text.

Almost there.  Now I just want to expose my “ShowSearchScreen” field.  However, my schema doesn’t have a field for it. Instead, the schema supports arbitrary tags with an ID=”” atrribute and node value.  This means I have to take advantage of a special function called data() which I will assign as the name of the column I want to show up as the node’s value.
SELECT ReportID as [@ID], Title, TitleHelp as [Description],
            ReportName as [FileName] ,
             (
                  SELECT 'ShowSearchScreen' as [@ID],
                              rss.ShowSearchScreen as [data()]
                        FROM @Report rss where rss.ReportID = r.ReportID
                        FOR XML PATH('Tag'), TYPE
            ) as [Tags],
            (
                  SELECT ParameterName as [@Name], [Type] as [@Type]
                        FROM @ReportParameter rp where rp.ReportID = r.ReportID
                        FOR XML PATH('Parameter'), TYPE
             ) as [Parameters]
            from @Report r
            FOR XML PATH('Report'), ROOT('Reports');

This results in our final XML document which meets our specs.

<Reports>
  <Report ID="1">
    <Title>Report #1</Title>
    <Description>First report...</Description>
    <FileName>Report1.rpt</FileName>
    <Tags>
      <Tag ID="ShowSearchScreen">1</Tag>
    </Tags>
    <Parameters>
      <Parameter Name="@EndDate" Type="datetime" />
      <Parameter Name="@StartDate" Type="datetime" />
    </Parameters>
  </Report>
  ...
</Reports>
Bonus tip – how to do a pivot: If you wanted to get the “AdminsOnly” field to show up as a second tag, you could extract it by changing the query to something like this:
SELECT ReportID as [@ID], Title, TitleHelp as [Description],
      ReportName as [FileName] ,
       (
            SELECT * FROM (SELECT 'ShowSearchScreen' as [@ID],
                        rss.ShowSearchScreen as [data()]
                  FROM @Report rss where rss.ReportID = r.ReportID
            UNION ALL
                  SELECT 'AdminsOnly' as [@ID],
                        rss.AdminsOnly as [data()]
                  FROM @Report rss where rss.ReportID = r.ReportID)
                  AS tags FOR XML PATH('Tag'), TYPE
      ) as [Tags],
      (
            SELECT ParameterName as [@Name], [Type] as [@Type]
                  FROM @ReportParameter rp where rp.ReportID = r.ReportID
                  FOR XML PATH('Parameter'), TYPE
       ) as [Parameters]
      from @Report r
      FOR XML PATH('Report'), ROOT('Reports');
This is a pretty standard  SELECT * FROM (<my subselect query>) with the FOR XML stuff on the outside.  I could add as many tags as I wished to do so by adding more UNION ALL statements.  This allows for pivoting the fields into the tag node list and will generate something like this:
    <Tags>
      <Tag ID="ShowSearchScreen">1</Tag>
      <Tag ID="AdminsOnly">0</Tag>
    </Tags>

Monday, January 17, 2011

Using the MERGE statement

I love MERGE.  You know all those awful IF BEGIN END ELSE BEGIN END blocks you need to write for SQL Server?  A well-written MERGE statement can eliminate a lot of them!  Let's do a quick intro.
--Prepare test tables.
DECLARE @MyProducts TABLE (
      ProductID INT PRIMARY KEY,
      Name VARCHAR(40) NOT NULL,
      Price MONEY NOT NULL,
      [Description] VARCHAR(100) NOT NULL,
      [LastModified] DateTime NOT NULL
      );

DECLARE @ImportList TABLE (
      TheProductID INT,
      TheName VARCHAR(40),
      LatestPrice MONEY,
      Descr VARCHAR(100)
      );
     
     
--Load sample data
INSERT INTO @MyProducts (ProductID,Name,Price,[Description],[LastModified])
      VALUES (35,'WIDGET TYPE B',99.95,'Blue','12/5/1999'),
                  (44,'WIDGET TYPE R',88.95,'Red','12/8/2010'),
                  (59,'WIDGET TYPE O',112.95,'Orange','12/9/2010'),
                  (77,'WIDGET TYPE G',55.95,'Green','12/15/2010'),
                  (110,'WIDGET TYPE Y',77.95,'Yellow','12/19/2010');
           
INSERT INTO @ImportList (TheProductID,TheName,LatestPrice,Descr)
      VALUES (44,'Widget Type C',78.95,'Crimson'),
                  (111,'Widget Type P',94.95,'Purple'),
                  (112,'Widget Type M',75.95,'Mauve');

           
--display existing table
SELECT 'BEFORE' as [Label], * FROM @MyProducts;

     
--demonstrate the data the merge will use
SELECT 'ExampleFullJoin' as [Label],
      CASE WHEN mp.ProductID IS NULL AND il.TheProductID IS NOT NULL THEN 'Not Matched'
            WHEN mp.ProductID IS NOT NULL AND il.TheProductID IS NULL THEN 'Not Matched By Source'
            WHEN mp.ProductID IS NOT NULL AND il.TheProductID IS NOT NULL THEN 'Matched' END as [MergeAction],
      mp.*, il.* FROM @MyProducts mp
            FULL JOIN @ImportList il on mp.ProductID = il.TheProductID;


--do the merge
MERGE INTO @MyProducts as mp
      USING (SELECT TheProductID,UPPER(TheName) as [Name], LatestPrice,Descr FROM @ImportList) AS stg
            ON mp.ProductID = stg.TheProductID
      WHEN MATCHED THEN
            UPDATE SET
                  mp.Name = stg.Name,
                  mp.Price = stg.LatestPrice,
                  mp.[Description] = stg.Descr,
                  mp.LastModified = CURRENT_TIMESTAMP
      WHEN NOT MATCHED THEN
            INSERT (ProductID,Name,Price,[Description],[LastModified])
            VALUES (stg.TheProductID, stg.Name, stg.LatestPrice, stg.Descr, CURRENT_TIMESTAMP)
      WHEN NOT MATCHED BY SOURCE AND (CURRENT_TIMESTAMP - mp.LastModified) > 3650 THEN
            DELETE;
           
           
--display table post-merge
SELECT 'AFTER' as [Label], * FROM @MyProducts;


 
Structure of the MERGE statement:
  1. The first part of the merge statement defines where you're intending to put the data.  I always use an alias for the destination to keep things unambiguous.  (@MyProducts is aliased above as "mp").
  2. The USING clause defines what you're going to use as the source for the new data.  I have a table called @ImportList that might match a CSV file you had just imported or something like that.  I'm doing SELECT here with an upper case function on my "TheName" field and aliasing it as [Name].  This is to demonstrate that I can do any normal SELECT query here that I wish including ones that join to other tables, call functions, etc.
  3. The "AS stg" clause is also an alias.  I like to alias my "USING" table as stg or staging.  It helps keep my brain straight that I'm going to be taking data from a "staging" table and putting it in my main table in some intelligent way.  Also neither "staging" or "stg" are reserved words in T-SQL.  The official T-SQL term for this table is SOURCE, which we will see later.
  4. The ON is where we do a join between the MERGE INTO table and the USING table - in this case it's where @MyProducts gets merged to what I'm calling "stg".  What I'm calling stg is similar to @ImportList, but not identical - note that I have changed the case and renamed a particular field.  Any standard operation you can do in a SELECT statement is valid and will be used as the dataset that gets merged.
  5. What type of join gets done?  Well, it's pretty much like a FULL JOIN (see "ExampleFullJoin" when you run the code).  We then are allowed to proceed to instruct SQL Server what to do based on what it finds.  This part operates much like a CASE block.
  6. The first "WHEN" clause uses the SQL keyword MATCHED which indicates that the fields in both tables for  the given row satisfy the join.  These are the rows that would come up if you did a SELECT and an INNER JOIN between the tables using the fields you specified in the ON clause.  Generally when the keys in the staging table and the main table match, you will do some sort of UPDATE to the main table, which I have done in the example.  You can do any sort of normal formatting that is possible, CASE blocks, call scalar functions, etc. - it's just like any update statement.  In the example, you can see how Product ID 44 is updated from Type R to Type C and from Red to Crimson.
  7. The second "WHEN" clause in my example says what to do when a row exists in the staging table (SOURCE) that doesn't exist in the main table.  In this example, we are choosing to do an INSERT for these rows.  (Data not in the main table, so we put it in there.)  The INSERT statement is pretty typical of the VALUES variety.  In the example, the purple and mauve rows get inserted.
  8. In our final "WHEN" clause, we tell SQL server what to do when there is a row in our main table that doesn't match a row in the staging table.  This is where the "BY SOURCE" clause on the MATCHED keyword comes in.  In our example I'm demonstrating an important thing that I don't think gets enough press in other MERGE examples on the Internet - the MATCHED logic is just essentially a boolean - and we can put additional clauses to test on the WHEN statement just like any other CASE block.  In the example, I'm saying "If a product is in the main table and not in the staging table, AND if that product hasn't been modified in more than ~10 years, do something".  I could have added similar additional logic to the above WHEN statements that did the update or insert as well if I desired different types of update or insert logic, which just would have required additional WHEN MATCHED AND statements.  If I did have these, each would be tested in sequence for each row and the first one that matches is executed.
  9. The last line of the merge instructs SQL Server to DELETE the row without a match in the staging table.  Be very careful with this clause and technique.  If you don't put additional qualifiers on the NOT MATCHED BY SOURCE, you will delete all of the data in your main table that doesn't match.
  10. Lastly, MERGE statements must end with a Semicolon, so that's on there.  Technically, all T-SQL statements could, but I would be surprised if MS ever enforced this.  Since MERGE is new functionality, they probably felt it was safe to start enforcing it for this statement only.
image