Use of the PIVOT operator in T-SQL has always been tantalizing yet confounding to me. It offers the promise of more concise and expressive code compared to a subselect pattern, yet I've always found it difficult to get the code just right so that I could apply it in a consistent way without my queries feeling like they are "working by miracle" or that there is way too much hardcoding going on resulting in my code becoming rigid or brittle.
I believe that I've finally figured out a good pattern for using PIVOT that is fast, concise, and reusable. The pattern hinges on these points.
- Only apply this pattern to datasets when you know how many categories you will end up with; for example, months/quarters in a year. If you don't know this at design time, it might be best to just return the raw data and let your presentation layer do the Pivot (assuming some reporting solution or app). Barring this, you're off into the land of dynamic SQL (bring a towel).
- Start with a CTE called "categoriesAndRawData". This represents your categories and the raw data to pivot. Included should be three aliased fields, [PivotOn], [PivotAggregate], and [PivotAnchor].
- The [PivotOn] field is the value that will turn into the column names or categories. You should try to keep this as simple as possible - best is a number, but a short code will also work.
- The [PivotAggregate] field represents a number that will be summed in the pivot result. If you are looking to do a count, you can hardcode a 1 for this field.
- The [PivotAnchor] field provides a point of reference for the Pivot On field. If you were PIVOTing out months into columns, this would be the year. This field is optional if you don't need an anchor such as if you were PIVOTing data into one of N categories and the categories don't need context.
- Create a minimalist subselect from the categoriesAndRawData CTE that includes only the Anchor, Aggregate, and PivotOn fields, plus any "keys to carry through" (the "keys to carry through" is the magic sauce of this pattern). Alias this as pIn - "pivot input".
- Add a PIVOT clause with a SUM of the pIn.PivotAggregate using the pIn.PivotOn field, and then list out your categories in the IN clause. Alias the pivot result as "pr".
Examples
I have setup the following example queries from the AdventureWorks2008R2 database.
I have the following query that provides the raw data needed to find all-time sales by month overall, by sales person, and by territory. (Note this only includes data where the sales person is known - there are many rows in AdventureWorks2008R2 where the sales person is NULL).
SELECT OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL;
Let's follow the pattern.
Step 1: I know that I want to break out this data into months per year so I know the number of categories (12 months) and my anchor element that describes the categories (the year). So I have confirmed that my dataset fits this pattern. Step 1 = Check.
Step 2: Let's get my query ready that will be used as the categoriesAndRawData CTE...
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL;
This is the same query as the raw data except that it has the three fields needed to do the PIVOT. The PivotOn field which will become my columns in the pivot result, the PivotAggregate which will be SUMmed to become the data in the pivot result, and the PivotAnchor which provides context for the PivotOn field. Because I am going for a "COUNT" of sales in my results, I can hardcode a 1 as the PivotAggregate. If I were interested in dollars of the sales, I could put either the SubTotal or TotalDue field here.
I can change this into a Common Table Expression (CTE) by stating it as this:
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
SELECT * FROM categoriesAndRawData;
This is the same query as before, except encapsulated as a CTE called categoriesAndRawData. If you are not familiar with Common Table Expressions, for the purpose of this discussion you can think of it like a named "temporary view". I now have my CTE and the [PivotOn], [PivotAggregate], and [PivotAnchor] fields. Step 2= Check.
Step 3: now I have to write a select statement that "includes only the Anchor, Aggregate, and PivotOn fields, plus any "keys to carry through"". For my first pivot I only want the count of sales by sales person, so here is that query:
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
SELECT PivotAnchor, PivotAggregate, PivotOn, SalesPersonID FROM categoriesAndRawData;
(We will apply the pIn alias to this when we convert it to a subselect in Step 4).
Since for this first query I don't care about the territory, I can exclude it from the query and I can just include the Anchor, Aggregate, "On", and the keys to carry through. Step 3 = Check.
Step 4: Now I have to wrap my bottom query from Step 3 in parentheses, and alias it AS pIn. Then I need to add a PIVOT operator after this, put an open parenthesis, and then put the text SUM(pIn.PivotAggregate) FOR pIn.PivotOn . Then I put a space and add an IN clause with my categories, and close out the pivot with a close parenthesis and the pr alias. For this query, since I am breaking out the data into month categories, that second part looks like this: IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) pr
The whole "Pivot" section including the original CTE, the minimalist query wrapped in parentheses and aliased as pIn, plus my PIVOT clause looks like this (though this query won't yet run):
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
--ToDo: Fill in this part...
(SELECT PivotAnchor, PivotAggregate, PivotOn, SalesPersonID FROM categoriesAndRawData) AS pIn
PIVOT (SUM(pIn.PivotAggregate) FOR pIn.PivotOn
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) pr;
Now all I have to do is fill in the part above the pIn subquery. I can insert the following text above the subquery (replacing the ToDo comment above):
SELECT pr.PivotAnchor AS [SaleYear],
pr.SalesPersonID,
[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]
Which gives me this as the final query:
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
SELECT pr.PivotAnchor AS [SaleYear],
pr.SalesPersonID,
[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]
FROM (SELECT PivotAnchor, PivotAggregate, PivotOn, SalesPersonID FROM categoriesAndRawData) AS pIn
PIVOT (SUM(pIn.PivotAggregate) FOR pIn.PivotOn
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) pr;
Hooray - pivoted data. I can add an ORDER BY at the bottom if I want to for example consolidate my years together and then show each sales person for that year.
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
SELECT pr.PivotAnchor AS [SaleYear],
pr.SalesPersonID,
[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]
FROM (SELECT PivotAnchor, PivotAggregate, PivotOn, SalesPersonID FROM categoriesAndRawData) AS pIn
PIVOT (SUM(pIn.PivotAggregate) FOR pIn.PivotOn
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) pr
ORDER BY pr.PivotAnchor, pr.SalesPersonID ASC;
From this screenshot, we can see that the sales data in AdventureWorks2008R2 starts in July 2005.
So what's so good about PIVOT and the use of this pattern? I could have achieved this by doing normal groups and aggregate functions - true, but this query is now quite flexible. For example, if I now want to change the query to group everything by territory, I can just change it to the following (different code is underlined and in red):
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
SELECT pr.PivotAnchor AS [SaleYear],
pr.TerritoryID,
[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]
FROM (SELECT PivotAnchor, PivotAggregate, PivotOn, TerritoryID FROM categoriesAndRawData) AS pIn
PIVOT (SUM(pIn.PivotAggregate) FOR pIn.PivotOn
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) pr
ORDER BY pr.PivotAnchor, pr.TerritoryID ASC;
If I wanted to group by territory and sales person ID, I could do that too:
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
SELECT pr.PivotAnchor AS [SaleYear],
pr.TerritoryID, pr.SalesPersonID
[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]
FROM (SELECT PivotAnchor, PivotAggregate, PivotOn, TerritoryID, SalesPersonID FROM categoriesAndRawData) AS pIn
PIVOT (SUM(pIn.PivotAggregate) FOR pIn.PivotOn
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) pr
ORDER BY pr.PivotAnchor, pr.TerritoryID, pr.SalesPersonID ASC;
From this screenshot we can see that sales person 275 made sales in both territory 2 and 3 in 2005.
Lastly, following this pattern makes it easy to retrieve your text descriptions. Just be sure to remember to do this after the fact - PIVOT is very fast when it's dealing with a minimalist data set, but it can get bogged down quickly when more columns are included in your pIn subselect*. Remember the guiding principle of step 3 and include "only the Anchor, Aggregate, and PivotOn fields, plus any "keys to carry through" ". So we just join in the people and territory info at the end using standard techniques for joining tables with foreign key relationships. This is done outside the pivot - meaning we join to the results of the pivot not categoriesAndRawData or pIn.
WITH categoriesAndRawData AS (
SELECT YEAR(OrderDate) AS [PivotAnchor],
MONTH(OrderDate) AS [PivotOn],
1 AS [PivotAggregate],
OrderDate, SalesPersonID, TerritoryID FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL AND TerritoryID IS NOT NULL
)
SELECT pr.PivotAnchor AS [SaleYear],
pr.TerritoryID, pr.SalesPersonID,
p.FirstName, p.LastName, st.Name AS [TerritoryName],
[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12]
FROM (SELECT PivotAnchor, PivotAggregate, PivotOn, TerritoryID, SalesPersonID FROM categoriesAndRawData) AS pIn
PIVOT (SUM(pIn.PivotAggregate) FOR pIn.PivotOn
IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) pr
INNER JOIN Person.Person p ON p.BusinessEntityID = pr.SalesPersonID
INNER JOIN Sales.SalesTerritory st ON st.TerritoryID = pr.TerritoryID
ORDER BY pr.PivotAnchor, st.Name, p.LastName, p.FirstName ASC;
I have purposefully been very prescriptive with naming in this pattern. You can feel free to name the things whatever you want, but this is what I have found works well for me to use the various layers of the PIVOT without having to rethink "what the heck does this represent again?". The only fields you actually have to use in your main query (the ones in the pivoted result set) are aliased with "pr." which is fairly quick to type/easy to remember.
*Edit July 9, 2012: fixed this to pIn - it has previously incorrectly said categoriesAndRawData.