Tuesday, December 6, 2011

Excel for the DBA, Part 1–The Basics

This will be a series of handy hints for people that know SQL and are frustrated by Excel.  I am going to be writing this for Excel 2007 (what I currently have at work) but the tips should work for most recent versions.  The SQL stuff should also work with SQL 2000 and higher, but I have only tested on 2008.

The Basics

Run this on a test DB in your SQL Server, making sure you’re set to the default of “Results to Grid” (CRTL-D):

SELECT 1 AS [ID],'Jim' AS [Name], 'Red' AS [FavoriteColor],

CAST('20110101 14:00:00' AS DATETIME2) AS [ModifedDate],

'007' AS [SpecialCode] UNION ALL

      SELECT 2, 'Sally','Blue', '20110205 12:30:00','025' UNION ALL

      SELECT 3, 'Tim', 'Orange', '20110308 04:17:00','097'  UNION ALL

      SELECT 4, 'Todd', 'Green', '20110512 18:22:00','101'  UNION ALL

      SELECT 5, 'Ann', 'Orange','20110719 19:59:00','002'

 

 

Left-click the origin of the results (the blank space left of the “ID” column header and above row number 1) to select all the results.  Then right click in the same spot and choose “Copy with Headers”.  Launch Excel and paste the data into Sheet1 in cell A1.  You should get a mess that looks like this:

image

Several things can quickly be fixed up about the presentation of this data.

  1. First of all, for some insane reason, Excel does not properly handle formatting date data pasted from SQL Management Studio – it assumes that the correct thing to do is format it as a time.  At least in this case, the data is still there and it’s just a presentation issue.
    • Highlight column D by clicking the column D header (above where it says “ModifiedD”) and then right clicking in the same location and choosing Format Cells…
    • Under the first tab, select category “Date” and then pick the closest one to what you want to see.
    • If you find one that is close, but not exact, select it, and then click “Custom” where you can see the actual formatting code under the “Type:” field for customization.  “Military” SQL time format for the Gregorian calendar is yyyy-mm-dd hh:mm:ss;@
    • Don’t worry about selecting the column header row when you do the formatting – Excel will usually ignore it (see below for some ways that this magic works).
  2. To fix the column widths, on the other fields, left-click the origin in Excel.  This should do a “Select All” on the current sheet.  Then, double click the vertical divider between cells A and B in the column header (when you are in the right place, the pointer should look like a vertical bar with an arrow pointing left and right).  If you did it correctly, all of the columns should have auto-sized to the width of the data.  You can do this for individual columns by just not doing “Select All” first.
  3. The other issue is that the leading zeros on the “SpecialCode” field have been dropped.  In this case, unfortunately, Excel has actually dropped the data and is treating the data as numeric.  There are two ways to fix this – a formula or a format.
    • To fix via formula, click on cell F2 and type =RIGHT(CONCATENATE("000",E2),3) and press <ENTER>.  Assuming you were doing a select statement in T-SQL from a table with a field named E2, this would be equivalent to RIGHT('000' + CAST(E2 AS VARCHAR(30)),3) To duplicate this formula to all successive cells in the column, you can just double-click on the “bump” on the lower right of the F2 cell border when it is selected.
    • The other way to fix it (since Excel actually does shred out the leading zeros) is to pre-format the column as text.  Select the column by clicking the column header for E, right-click and do Format Cells… and on the first tab, choose “Text”.  (You can also choose “Custom” and use the format code @ which basically means “whatever was in there”.).  Then go back to SQL Management Studio, reselect all and recopy the data with headers.  Then paste the data back in to Excel in A1 and it should no longer drop the leading 0s; the only difference is that now Excel puts a green triangle in the cell to warn you that the data is a “Number stored as text”.  Note that these format codes are doubly-useful for SQL DBAs as they are also used by SSRS as the number/date formatting codes.
      image
  4. The column headers in Excel are not differentiated from the data in any meaningful way.  Excel is a bit magic so it does make a difference to hint to it that row 1 is not data but header info.  The easiest way to do this is to click the “1” row header to select all of row 1 and hit CTRL-B to make the row bolded.  This is enough to ensure that Excel knows that these are really column names and not data.
    • Note that Excel is generally pretty smart and will realize that row 1 is a header row automatically if for example you have text in all fields in row 1, but some columns have numerics or other data types in all successive columns.
  5. One last handy thing is to do “Freeze Panes”.  Click Cell A2 (this is the upper left data field).  Then in Excel 2007, click the “View” ribbon tab and “Freeze Panes… Freeze Panes”.  In Excel 2003, it’s under Window… Freeze Panes.  Then if you scroll down, the column headers will remain, but only data will move. This is not super useful with a small data set, but it’s great for large ones.  Just remember to click cell A2 before you do it because anything above and to the left of the selected cell will be frozen when the feature is enabled.

Coming Next

In the next post, I will be showing you how to use VLOOKUP and some other functions in Excel.  VLOOKUP is an awesome function that allows Excel to do some fairly clever things to facilitate analyzing different sets of data (it is also how some users start to develop scary monster sheets, so be careful!).  I will also show you the Filter feature(awesome!!!) and also how to get a bit quicker with copying formulas around via the keyboard.

Thursday, November 3, 2011

An Attempt to Make SSIS Package Deployment More Reasonable

Full disclosure: There might be a better way to do this.  I am not aware of it.

Problems:

  1. Maintaining multiple DB environments (Local, Shared Dev, Staging, Prod) means needing to maintain multiple SSIS environments.
  2. SSIS packages are tricky to configure using the existing toolset – it’s easy to accidentally use the wrong config file, or forget that you turned config files off, etc.
  3. There’s insanity regarding different developers configuring different connection strings.
  4. There’s no good out of the box way to say “deploy package X to environment Y”, etc.
  5. There’s no good way to keep your DTSConfig files for different environments in your Source Control system.

Idea:

I want to write a script (or something) to take an SSIS package that I have built and:

  1. Have a sane way to copy that SSIS package to the correct spot on a file system for the “environment” that I have selected (Local, Shared Dev, Staging, Prod).
  2. Ensure that “Use SSIS Config File” is enabled on the package that is copied to that location regardless of if it is enabled on the package that I am editing in Visual Studio locally.
  3. Ensure that an up-to date dtsConfig file is deployed along with the SSIS package (if desired) and also to ensure that the SSIS package is hacked to use a relative path to that dtsConfig file such as .\myconfig.dtsConfig (if desired, absolute paths also supported).
  4. Allow updating the SSIS config file appropriately with connection strings appropriate for that environment.

Result:

I’ve now built such a thing.  It’s a VBScript where the top of the script looks like this (skipping Option Explicit):


 

CONST DtsPackage = "MyPackageName.dtsx"

CONST DtsConfig = ".\MyPackageName.dtsConfig"

CONST DeployTo = "\\MyProdServer\e$\SomePathGoesHere\Packages"

CONST DeployConfig = True

Const MaxDTSXFileAgeInMinutes = 10

Dim ConnectionStrings

ConnectionStrings = Array( "OutputTextFile","E:\SomePathGoeshere\Output.txt", _

   "DBConnection","Data Source=MyProdServer;Initial Catalog=MyDatabase;Integrated Security=True;")

 


That’s the production one.  The Dev one looks like this (differences highlighted):


 

CONST DtsPackage = "MyPackageName.dtsx"

CONST DtsConfig = ".\MyPackageName.dtsConfig"

CONST DeployTo = "\\MyDevServer\D$\SomePathGoesHere\DevPackages"

CONST DeployConfig = True

Const MaxDTSXFileAgeInMinutes = 10

Dim ConnectionStrings

ConnectionStrings = Array( "OutputTextFile","D:\SomePathGoeshere\Dev\Output.txt", _

    "DBConnection","Data Source=MyDevServer;Initial Catalog=MyDatabase;Integrated Security=True;")


I am able to check both of these VBScripts into source control.  This means any of my developers can use them.  The way it’s written, if DeployConfig is set to false, you don’t need to put the connection strings stuff in the script so the production info can stay secure (you’d just have to pre-stage the config file and not grant the devs access to the destination folder).

In the current VBScript incarnation, to run the deploy script, you just open the SSIS project folder, open a command prompt, type CSCRIPT<space>, drag and drop the script onto the command line and hit ENTER.  After a few seconds, your SSIS package is deployed to the correct environment.  DONE!!!  Once the script for an environment is written, this process takes about 15 seconds.  Writing the script for an environment takes 15 minutes or so just to confirm all the details, give it a quick test, and get the new script into Source Control.

Downsides:

The current version of this uses VBScript. I know, I know – 2003 called and it wants its script engine back.  I wanted to write v1 in PowerShell but I wanted to get it done more than I wanted it to be cool.  I promise that rev 2 will be in PowerShell.  This will allow the deploy scripts to really be just the config data on top and a reference to a shared ps1 file for the actual code.  The current incarnation unfortunately requires copying and pasting the main VBS body between files and that is obviously poor practice.

Next Steps:

I need some help with ideas for this and I need some help testing this out.  If you are interested and you support multiple environments/multiple servers/multiple developers and have your SSIS packages in some sort of source control system, please send me a PM on the SQLServerCentral.com forums (my handle is nycdotnet).

http://www.sqlservercentral.com/Forums/UserInfo680048.aspx

Of course this could completely crash and burn your SSIS environments so please only reach out if you’re really OK with some buggy code or issues with testing.  Thanks!

Friday, October 28, 2011

Save a step: Using Computed Columns in a Table Variable (or Temp Table)

One of the databases that I support reports all money values in both the local currency and US Dollars.  Very often we will do all of our calculations in a table variable using the local currency and then calculate USD equivalents at the very end.

With this methodology, adding a new money field to a table variable requires making four code changes – adding the local money field, adding the USD money field, writing the code to retrieve the appropriate data for the local money field, and then modifying the final update or select statement to include the USD equivalent calculation.

I recently realized that we could save a step and do our FX calculations “automatically” by just including the USD equivalent columns in our table variables as computed columns.

If you are already using temporary tables or table variables in a “Set By Agonizing Set” SBAS style (perhaps I just coined this?), using this technique may allow you to keep your code a bit cleaner.

Here is an example:

DECLARE @FXRatesToUSD TABLE (

      Currency VARCHAR(3) NOT NULL PRIMARY KEY,

      RateToUSD FLOAT);

     

--Approximate FX Rates in October 2011

INSERT INTO @FXRatesToUSD

      VALUES ('USD',1.0), ('GBP',0.62), ('NOK',5.43);

 

DECLARE @Example TABLE (

      VehicleID INT NOT NULL PRIMARY KEY,

      Currency VARCHAR(3) NULL,

      PriceLocal MONEY NULL,

      FXRate FLOAT NULL,

       --computed column in the table variable

      PriceUSD AS CAST(ROUND(PriceLocal / FXRate,-3) AS MONEY)

      );

 

--Insert sample data

INSERT INTO @Example (VehicleID, Currency, PriceLocal)

      VALUES (1, 'USD', 45000),

            (2, 'USD', 14000),

            (3, 'USD', 31000),

            (4, 'GBP', 20000),

            (5, 'NOK', 190000);

 

--take a look at the table

SELECT 'FX Rates not yet set' AS [example], * FROM @Example;

 

--merge in our FX rates

UPDATE e1

      SET e1.FXRate = fx.RateToUSD

      FROM @Example e1

      INNER JOIN @FXRatesToUSD AS fx ON fx.Currency = e1.currency;

 

--now that the FX rates are present, our USD prices are available

SELECT 'FX Rates are set' AS [example], * FROM @Example;

 

--discount one of the cars

UPDATE @Example SET PriceLocal = 18000 WHERE VehicleID = 4;

 

--You should see that both prices are updated now for vehicle 4.

SELECT 'Updated GBP and USD price for Vehicle 4' AS [example],

* FROM @Example;

 

VehiclePrices

One of the only problems with this approach is that it does not seem possible to “chain” computed columns.  If I added a maximum USD discount field such as this to the table:

MaxDiscountUSD AS PriceUSD * 0.1

I would unfortunately get this error.

Msg 1759, Level 16, State 0, Line 11

Computed column 'PriceUSD' in table '@Example' is not allowed to be used in another computed-column definition.

In this case, it would seem that this technique should only be used for very simple one-pass cases or in conjunction with deterministic scalar UDFs that modularize the underlying logic.

Thursday, June 30, 2011

Tobias Ternstrom’s Great Denali Preview

Tobias Ternstrom from Microsoft gave a great talk at Tech Ed regarding upcoming features of SQL Server Denali.  His presentation was engaging, funny, and informative!

You can check it out on the Channel 9 site here (72 minutes):

http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI318

I have to say that I am very excited about all of the improvements that he demonstrated for T-SQL programmers, and I’m also pleased to see that SQL Server has been upgraded to use the “Visual Studio 2010” UI.

Wednesday, May 18, 2011

The Recursive CTE

(Yet another boring org chart example – except this one has multiple roots)

One of my apps has a hierarchial org chart in its database.  We don’t use the hierarchyid data type for this table.  Instead, each row in the table just has a unique org ID and the org ID of the parent org which should exist in the same table.  The ParentOrgID field is NULL in the rows for the root orgs.  Notice I said root orgs – plural.  We have an odd situation with this database where there’s actually two different companies’ trees stored in this single table.  I recently had a requirement to pull out all the orgs for both companies and say what the root was, in addition to presenting how deep in the hierarchy it lived.  Here’s how I did it.

--Example source table

DECLARE @Orgs AS TABLE (

      OrgID INT PRIMARY KEY,

      ParentOrgID INT NULL,

      OrgName VARCHAR(20) NOT NULL

);

 

--Set up some sample data

INSERT INTO @Orgs (OrgID, ParentOrgID, OrgName) VALUES

      (1,NULL,'Company A (Root)'),

      (2,1,'Marketing'),

      (3,1,'Finance'),

      (4,3,'Accounting'),

      (5,2,'Sales'),

      (6,1,'Research'),

      (7,NULL,'Company B (Root)'),

      (8,7,'Production'),

      (9,7,'Development'),

      (10,7,'Processing'),

      (11,10,'Shipping'),

      (12,3,'IT'),

      (13,8,'Industrial Relations'),

      (14,2,'Branding'),

      (15,11,'Intl Shipping'),

      (16,12,'DBAs'),

      (17,12,'Server Ops'),

      (18,12,'Desktop Support');   

 

--Declare some constants pointed at my root orgs (could also use a config table)

DECLARE @ARoot INT;

DECLARE @BRoot INT;

SELECT @ARoot = OrgID FROM @Orgs WHERE OrgName = 'Company A (Root)';

SELECT @BRoot = OrgID FROM @Orgs WHERE OrgName = 'Company B (Root)';

 

WITH AllOrgs AS  --setting up the CTE

(SELECT OrgID,

      CASE WHEN OrgID = @ARoot then 'A'  --This case block is my way of dealing with

            WHEN OrgID = @BRoot then 'B'  --aliasing the company names as a code

            ELSE '' END as [RootCompany],

            1 as [OrgLevel],             --C.S. students would put a 0 here...

            OrgName, ParentOrgID FROM @Orgs

      Where OrgID = @ARoot OR OrgID = @BRoot --This is the query for the first row

            UNION ALL

      SELECT c.OrgID, p.RootCompany, p.OrgLevel + 1, c.OrgName, c.ParentOrgID

            from AllOrgs p

            INNER JOIN @Orgs c

            on c.ParentOrgID = p.OrgID  --query for remaining rows

)

SELECT OrgID, RootCompany, OrgLevel,Orgname, ParentOrgID

FROM AllOrgs Order by RootCompany, OrgLevel, OrgName; --returns the CTE data

 

Inside the CTE are two SELECT statements married by a UNION ALL.  The first one is basically for the first line of data to return (or first lines in this case, since there are two possible roots).

The second query returns the data for each successive row, recursively.  SQL Server will automatically quit recursion when a row is null so you don’t have to worry about closing the loop yourself somehow as long as you do the join correctly and there are no circular references.  In the second query, “p” aliases the CTE itself which allows us to read the parent row’s data and do some basic operations such as adding one to its OrgLevel value.  Note that as with all UNION statements in SQL Server, you only have to alias the column names in the top query – column names in later queries are ignored.

Finally, I select the data I want with my desired sorting.  CTEs self-destruct after use (note the semicolon after the select, but not after the closing parenthesis after the CTE setup), so if you need to do multiple operations on the results, insert them into a temp table first.