Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

Monday, September 8, 2014

The Easy Way to Optimize a Stored Procedure

I noticed a stored procedure in SQL Server (2008) -- let's call it p_TheProcedure -- taking about 10 seconds to run on production,  Since this is a procedure that generally should run in under a second, that was a problem that I needed to fix.

So I created a copy of called p_TheProcedure_test that I could use to fine-tune without affecting anything on the live site.  (Before you lecture me on testing things on production, let me just point out that this procedure doesn't update any tables other than temp tables).  So I ran p_TheProcedure_test and noticed that it ran in under 1 second.  That's funny, let me try p_TheProcedure again -- 10 seconds.  What's going on?  The code is exactly the same in both procedures.

So, I tried this:

drop procedure dbo.p_TheProcedure

go

create procedure dbo.p_TheProcedure
...

After than, running dbo.p_TheProcedure is back down to below 1 second.  Problem solved.

I'm guessing this has to do with a bad query execution plan.  I could research the exact technical reason, but I'm not a theoretician, and I have work to do.

Tuesday, October 8, 2013

Dangerous SQL Aggregates

Here's one for the the "seemingly innocuous change that introduces a bug" file.  The code I'm sharing is not the actual code where the incident happened, just the same pattern.  Here's the TSQL:

declare @Dates table (aDate date)
insert into @Dates values ('1/1/2000');
insert into @Dates values ('1/1/2001');
insert into @Dates values ('1/1/2002');
insert into @Dates values ('1/1/2003');

declare @AnyDateFound bit = 0;
declare @MaxDate date;

select
       @AnyDateFound = 1
from
       @Dates 
where
       aDate > '1/1/2005';

In this example, @AnyDateFound would obviously be 0.

I made an update like this:

declare @MaxDate date;

select
       @AnyDateFound = 1,
       @MaxDate = max(aDate)
from
       @Dates 
where
       aDate > '1/1/2005';

Most SQL Developers would spot the error right away.  But I assumed that if no records met the conditions in the where clause, @AnyDateFound would not be updated to 1, and would remain 0.  Not the case.

The problem is that aggregates always return records.  There is a maximum, even if the maximum is null.  It makes perfect sense when you think about.

select 0 where 1 = 2; -- no records
select max(0) where 1 = 2; -- one record

Tuesday, May 14, 2013

Arbitrary Groups in SQL Server Using Windowing

I often need to divide recordsets into groups for various reasons.  For example, if we’re testing an application, the testers might want a census of sample users they’re going to test with, and I want to assign each sample user to a test group to ensure that they’re only tested once.  This can be done using windowing in SQL Server (windowing was introduced in SQL Server 2005).

Here’s an example:

--create a sample dataset  
   
 create table #temp (MyRecord integer identity(1000, 7), [Name] nvarchar(50));  
   
 declare @x integer = 1;while @x <= 500 begin  
      insert into #temp ([Name]) values ('Sample ' + convert(nvarchar(40), getdate(), 121));  
      set @x = @x + 1;  
 end;  
   
 -- divide into groups  
   
 declare @Groups decimal(20, 4) = 30;-- how many groups we want. The overflow will go into another group  
   
 select  
   
      [Testing Group] =   
           convert(integer,   
                round(  
                     row_number() over (order by t.MyRecord)   
                     / (count(t.MyRecord) over () / (@Groups)) -- this is going to be the size of each group  
                     + 1,  
                     0, 1 -- arguments for the round function. This will cause the decimal to be truncated                 
                )  
           ),  
                                 
      t.*  
        
 from  
      #temp t;  

There are probably more elegant solutions, and the code looks a little messy, but it works.

Saturday, February 16, 2013

Dynamic Comparison Tables in ASP.NET

Comparison tables / charts are quite common on the web.  Using a custom method, you can dynamically populate a comparison table in ASP.NET, using normalized data, with the table rendered with exactly as many columns as are needed.

First I’m going to talk about a suboptimal way of attempting this.  If you’re worked with data grids in ASP.NET before, you know that you can dynamically populate a table by binding a data table to a data grid with the AutoGenerateColumns property is set to true.

So if your data looks like this:

image

You can bind it to a data table and get this:

image

(Yes, I could have made this example about NFL quarterbacks, cars, or whatever, but why pretend to be someone that I’m not?)

The problem is, how do you get a SQL table or view like that to begin with?  If your data is normalized, you may have do so something like:

 select  
[Person] = rc.Name,
[Race] = isnull(cv1.value, ''),
[Affiliation] = isnull(cv2.value, ''),
[Primary Weapon] = isnull(cv3.value, ''),
[Secondary Weapon] = isnull(cv4.value, ''),
[Residence] = isnull(cv5.value, '')
from
@RowCat rc
left join @CellValues cv1 on rc.Id = cv1.RowCatId and cv1.ColCatId = 1
left join @CellValues cv2 on rc.Id = cv2.RowCatId and cv2.ColCatId = 2
left join @CellValues cv3 on rc.Id = cv3.RowCatId and cv3.ColCatId = 3
left join @CellValues cv4 on rc.Id = cv4.RowCatId and cv4.ColCatId = 4
left join @CellValues cv5 on rc.Id = cv5.RowCatId and cv5.ColCatId = 5

Why is that a problem?



  • It’s tedious to write.
  • There’s a performance-hurting left join for each category.
  • Changing the categories requires a database object update.
  • Since the categories are hard-coded, the query is only good for fixed categories names and a fixed number of categories.

I created a solution that addresses all of those issues.  Basically, you can keep your data normalized, using a query like this rather than the one above:

 select  
RowCatId = rc.id,
RowCatName = rc.Name,
ColCatName = cc.name,
CellValue = isnull(cv.Value, '')
from
@RowCat rc
cross join @ColCat cc
left join @CellValues cv on rc.Id = cv.RowCatId and cc.Id = cv.ColCatId
order by
rc.Id,
cc.Id;

Then you can take that dataset and pass it to the custom method below.  Instead of populating a data grid, the method populates a table control.

   ''' <summary>  
''' Data table should contain the columns in this enumeration. They don't have to have the same names.
''' Sort by row category id, then column category id. Every row category should have exactly one of each column category.
''' If it doesn't this method is not going to work right.
''' </summary>
''' <remarks></remarks>
Public Enum DynamicColomnDataTableConstants As Integer
''' <summary>
'''
''' </summary>
''' <remarks>must be a positive integer</remarks>
RowCategoryId
RowCategoryName
ColomnCategoryName
CellValue
End Enum

Public Shared Sub PopulateDynamicColumnTable(ByVal TheWebTable As Table, ByVal TheDataTable As DataTable)

If TheDataTable.Rows.Count = 0 Then Return

Dim HeaderRowId As Integer = TheWebTable.Rows.Add(New TableRow)
TheWebTable.Rows(0).Cells.Add(New TableCell) 'this is the empty cell in the upper-left-hand corner

Dim LastRowCategoryId As Integer = TheDataTable.Rows(0)(DynamicColomnDataTableConstants.RowCategoryId)
Dim ThisRowCategoryId As Integer = LastRowCategoryId

Dim HeaderCellId As Integer = -1

'this isn't going to loop through all rows of the table table, only enough to create all of the
'header columns
For Each dr As DataRow In TheDataTable.Rows

ThisRowCategoryId = dr(DynamicColomnDataTableConstants.RowCategoryId)
'if row category has changed, header row has been created, and this loop can be exited
If ThisRowCategoryId <> LastRowCategoryId Then Exit For

HeaderCellId = TheWebTable.Rows(HeaderRowId).Cells.Add(New TableCell)
TheWebTable.Rows(HeaderRowId).Cells(HeaderCellId).Text = dr(DynamicColomnDataTableConstants.ColomnCategoryName)

Next

LastRowCategoryId = -1
ThisRowCategoryId = TheDataTable.Rows(0)(DynamicColomnDataTableConstants.RowCategoryId)

Dim ThisRowId As Integer = -1
Dim ThisCellId As Integer = -1
Dim TheCellValue As String = String.Empty

For Each dr As DataRow In TheDataTable.Rows

ThisRowCategoryId = dr(DynamicColomnDataTableConstants.RowCategoryId)

If ThisRowCategoryId <> LastRowCategoryId Then 'row category is different, need to create new row and header cells

ThisRowId = TheWebTable.Rows.Add(New TableRow)
ThisCellId = TheWebTable.Rows(ThisRowId).Cells.Add(New TableCell)

TheWebTable.Rows(ThisRowId).Cells(ThisCellId).Text = dr(DynamicColomnDataTableConstants.RowCategoryName)

End If

ThisCellId = TheWebTable.Rows(ThisRowId).Cells.Add(New TableCell)

TheCellValue = dr(DynamicColomnDataTableConstants.CellValue)
If TheCellValue = String.Empty Then TheCellValue = "&nbsp;"

TheWebTable.Rows(ThisRowId).Cells(ThisCellId).Text = TheCellValue
LastRowCategoryId = ThisRowCategoryId

Next

End Sub

With that method, you’ll get a table that’s exactly the same as the data grid shown earlier in this post.


I’m sure the the same thing could be accomplished with fewer lines of code, and probably has been.  But this is my solution, and it seems to work well.


A sample .aspx and .aspx.vb page that includes all of the code and a working example can be downloaded here (.zip file).

Sunday, February 10, 2013

Excel as an ETL Tool

This is probably going to seem extremely basic to some people, but I think it’s a very useful technique that many SQL developers might not know about.

You can use Excel as a quick and precise ETL tool.  Not for 10GB data files.  Not for data that’s not already in Excel.  But when you have data in Excel – especially if it’s messy – and you want to get in into SQL Server quickly, there’s a good way of doing that.

Let’s start with some messy Excel data.  Here are the current NBA standings that I copied from the NBA’s website.

image

Now, the data isn’t hideous.  It’s organized in a consistent manner.  It’s just not normalized.  The technique I’m going to describe works on data in all kinds of formats.

Create the Table

The first thing I’m going to do is define the table in SQL.  I’m going to write the create table statement in Excel, so that my entire script will be in one place.  So in cell M2, I’m going to write:

create table dbo.NbaStandings (Id integer identity,TimePeriod varchar(100), Conference varchar(50), Division varchar(50), Team varchar(50), Wins integer, Losses integer, WinPercent decimal(4, 3), GamesBack decimal(10, 2), ConfRecord varchar(10), DivRecord varchar(10), HomeRecord varchar(10), RoadRecord varchar(10), Last10 varchar(5), Streak varchar(5));

The column definitions are derived from the data, of course.  That’s the easy part.

Insert the Data, Part 1

Next I’m going to create the insert statements in Excel, referencing the data in the spreadsheet.  In cell M6, I’m going to write:

="insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values ("

But I'm not ready to write more just yet.  SQL Server string values have to be enclosed in single quotes.  Also, when you concatenate cell references in an Excel formula, you have to use double quotes.  I could write the rest of the insert statement that way, but that’s going to be one cumbersome, unreadable formula.  Some Excel macros will make it much easier to write.

Write the Macros (or just use mine)

Here are the macros I’m going to create in the Excel VBA editor.  I recommend keeping these macros in your personal macro workbook, so you don’t have to add them to every workbook you want to use them in.

This macro will escape single quotes in in a string, and enclose a string in single quotes:

Public Function SqlEncode(TheString) As String

 Public Function SqlEncode(TheString) As String  

Dim ReturnValue As String
ReturnValue = TheString

ReturnValue = Replace(ReturnValue, Chr(39), Chr(39) & Chr(39)) 'escape apostrophe
ReturnValue = Chr(39) & ReturnValue & Chr(39) 'enclose in apostrophes

SqlEncode = ReturnValue

End Function

This macro will take a single cell, and apply the macro above to its value, optionally adding a comma:

 Public Function SqlValue(TheRange As Range, Optional IncludeComma As Boolean = True) As String  

Dim ReturnValue As String

ReturnValue = ReturnValue & SqlEncode(TheRange.Text)

If IncludeComma Then ReturnValue = ReturnValue & ", "

SqlValue = ReturnValue

End Function

And this macro will take a range of cells, apply the SqlEncode macro to all of them, and concatenate them with commas:

 Public Function SqlValues(TheRange As Range) As String  

Dim TheCell As Range
Dim ReturnValue As String
Dim counter As Integer

counter = 0

For Each TheCell In TheRange.Cells
If counter > 0 Then ReturnValue = ReturnValue & ", "
ReturnValue = ReturnValue & SqlEncode(TheCell.Text)
counter = counter + 1
Next

SqlValues = ReturnValue

End Function

With those macros created, I’m ready for the next step.


Insert the Data, Part 2


The first real record on the spreadsheet is on line 6, but the values in the three rows above will also be part of the record.  This is where the data not being normalized makes things messy.  Here’s my continued formula:

="insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values (" & SqlValue(B$3) & SqlValue(B$4) & SqlValue(B$5)

Notice that I made the row reference for those three values absolute, because I don’t want them to change when I copy the formula to new rows.


The next part of the formula uses the data on the same row my formula is on, so I can use the SqlValues macro to take care of them all:

="insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values (" & SqlValue(B$3) & SqlValue(B$4) & SqlValue(B$5) & SqlValues(B6:L6) & ");"

My formula is now complete. Here's the resulting TSQL:

insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values ('2012-2013 Division Regular Season Standings', 'Eastern Conference', 'Atlantic', 'New York', '32', '16', '0.667', '0', '20-9', '41398', '19-6', '13-10', '41458', 'W 1');

Now I can go ahead and copy the formula into all of the other data rows on the spreadsheet:


image


I have to change the formula’s references to the conference and division cells several times when copying the formula, because this data set isn’t normalized, as previously pointed out.  I could have used an OFFSET formula in Excel to get around having to do that, but I wanted to keep this example relatively simple.


Execute the Script


The entire script to create and populate the table in now in column M of my spreadsheet.  I’m going to go ahead and copy that column into new query window in SQL Server:


image


Once I execute the script, the data is now in a SQL Server table:


image


Conclusion


Given how de-normalized that data was, this wouldn’t have been possible with a standard ETL engine such as SSIS, without first normalizing the data in Excel.


At my job, I get a lot of data in a format like this or worse, so I’ve found this technique very helpful over the years.  I hope others will fine it helpful as well.