Thursday, January 19, 2012

Case Sensitive searches in T-SQL

I just ran into an issue with one of my applications where a SQL statement had updated a status code field to a lower-case status code such as “active”.  We were expecting “Active” in the application and as a result our application began exhibiting odd behavior.  There are a host of things that went wrong here and that could have been done better, but it certainly does illustrate an issue with a case-insensitive database (our collation sequence is the USA default SQL_Latin1_General_CP1_CS_AS) interacting with a case-sensitive application.  Here are some ideas we came up with.

How to find the incorrect records

First off, we have to find the offending records.  I was able to run this to find the records we needed.

SELECT ID, [Status] FROM MyTable WHERE [Status] = 'active' COLLATE SQL_Latin1_General_CP1_CS_AS;

Note the COLLATE keyword refers just to the string ‘active’.  It’s not a query modifier like ORDER BY would be – instead it actually directly modifies the value that precedes it.

Sidebar: This is conceptually similar to how some implicit typing works in VB.Net – In VB, coding x=“s” implicitly declares x as a string type, but x=“s”c implicitly declares x as a character type; x=1.1F declares x as a single precision float, but x=1.1D declares x as a decimal type.  The COLLATE directive in this T-SQL statement declares the text 'active' as SQL_Latin1_General_CP1_CS_AS (case sensitive Latin ascending sort) so SQL will treat it as such when evaluating the statement.

Another note is that STATUS is a reserved word in T-SQL, so I am delimiting my field name, [Status], with square brackets. Delimiting with square brackets instead of single- or double-quotes makes SQL Server very happy – particularly the IntelliSense feature of SQL Management Studio.

So this is how you do a case-sensitive search on a SQL server with a case-insensitive sort order.  (For non-USA default SQL server installations, you should specify your current server collation and change “CI” to “CS”.)

How to fix the incorrect records (only)

I was able to fix the records in my database by using code like this.  This has the advantage of only doing updates on the rows that have the incorrect case.  To do this in a different environment, you’d have to provide a line per bad code.  I am using SET XACT_ABORT ON in this case so that the entire transaction will be stopped and rolled-back automatically in case of any errors encountered.

SET XACT_ABORT ON;

BEGIN TRAN;

 UPDATE MyTable SET [Status] = 'Active'

  WHERE [Status] = 'active' COLLATE SQL_Latin1_General_CP1_CS_AS;

 UPDATE MyTable SET [Status] = 'Inactive'

  WHERE [Status] = 'inactive' COLLATE SQL_Latin1_General_CP1_CS_AS;

 UPDATE MyTable SET [Status] = 'Withdrawn'

  WHERE [Status] = 'withdrawn' COLLATE SQL_Latin1_General_CP1_CS_AS;

COMMIT TRAN;

If you wanted to fix this in a single statement, you could do something like this:

UPDATE MyTable

  SET [Status] = CASE

    WHEN [Status] = 'active' COLLATE SQL_Latin1_General_CP1_CS_AS THEN 'Active'

    WHEN [Status] = 'inactive' COLLATE SQL_Latin1_General_CP1_CS_AS THEN 'Inactive'

    WHEN [Status] = 'withdrawn' COLLATE SQL_Latin1_General_CP1_CS_AS THEN 'Withdrawn'

  END

  WHERE [Status] IN ('active' COLLATE SQL_Latin1_General_CP1_CS_AS,

    'inactive' COLLATE SQL_Latin1_General_CP1_CS_AS,

    'withdrawn' COLLATE SQL_Latin1_General_CP1_CS_AS);

 

Notice how it is required to put the COLLATE after every string.  The “Sidebar” above will explain why.

 

How to prevent this from happening in the first place

In this situation, I didn’t have a foreign key relationship on this field, so the best I could do was enforce a constraint.  This is the constraint I put into place:

 

ALTER TABLE MyTable

ADD CONSTRAINT MyTable$ValidStatus

  CHECK([Status] IS NULL OR

    [Status] IN ('Active' COLLATE SQL_Latin1_General_CP1_CS_AS,

      'Withdrawn' COLLATE SQL_Latin1_General_CP1_CS_AS,

      'Inactive' COLLATE SQL_Latin1_General_CP1_CS_AS);

With this constraint in place, an insert or update that tried to set the value of the Status field to something other than NULL, “Active”, “Withdrawn”, or “Inactive” would fail.  I can check that my check constraint is working by querying the sys.checkconstraints view.  If is_not_trusted comes back as 0, then my constraint should be working.

 

SELECT is_not_trusted, * FROM sys.check_constraints

  WHERE name = 'MyTable$ValidStatus';

No comments:

Post a Comment