Monday, September 8, 2014
The Easy Way to Optimize a Stored Procedure
So I created a copy of called p_TheProcedure_test that I could use to fine-tune without affecting anything on the live site. (Before you lecture me on testing things on production, let me just point out that this procedure doesn't update any tables other than temp tables). So I ran p_TheProcedure_test and noticed that it ran in under 1 second. That's funny, let me try p_TheProcedure again -- 10 seconds. What's going on? The code is exactly the same in both procedures.
So, I tried this:
drop procedure dbo.p_TheProcedure
go
create procedure dbo.p_TheProcedure
...
After than, running dbo.p_TheProcedure is back down to below 1 second. Problem solved.
I'm guessing this has to do with a bad query execution plan. I could research the exact technical reason, but I'm not a theoretician, and I have work to do.
Tuesday, October 8, 2013
Dangerous SQL Aggregates
declare @Dates table (aDate date) insert into @Dates values ('1/1/2000'); insert into @Dates values ('1/1/2001'); insert into @Dates values ('1/1/2002'); insert into @Dates values ('1/1/2003'); declare @AnyDateFound bit = 0; declare @MaxDate date; select @AnyDateFound = 1 from @Dates where aDate > '1/1/2005';
In this example, @AnyDateFound would obviously be 0.
I made an update like this:
declare @MaxDate date; select @AnyDateFound = 1, @MaxDate = max(aDate) from @Dates where aDate > '1/1/2005';
Most SQL Developers would spot the error right away. But I assumed that if no records met the conditions in the where clause, @AnyDateFound would not be updated to 1, and would remain 0. Not the case.
The problem is that aggregates always return records. There is a maximum, even if the maximum is null. It makes perfect sense when you think about.
select 0 where 1 = 2; -- no records select max(0) where 1 = 2; -- one record
Tuesday, May 14, 2013
Arbitrary Groups in SQL Server Using Windowing
Here’s an example:
--create a sample dataset create table #temp (MyRecord integer identity(1000, 7), [Name] nvarchar(50)); declare @x integer = 1;while @x <= 500 begin insert into #temp ([Name]) values ('Sample ' + convert(nvarchar(40), getdate(), 121)); set @x = @x + 1; end; -- divide into groups declare @Groups decimal(20, 4) = 30;-- how many groups we want. The overflow will go into another group select [Testing Group] = convert(integer, round( row_number() over (order by t.MyRecord) / (count(t.MyRecord) over () / (@Groups)) -- this is going to be the size of each group + 1, 0, 1 -- arguments for the round function. This will cause the decimal to be truncated ) ), t.* from #temp t;
There are probably more elegant solutions, and the code looks a little messy, but it works.
Saturday, February 16, 2013
Dynamic Comparison Tables in ASP.NET
Comparison tables / charts are quite common on the web. Using a custom method, you can dynamically populate a comparison table in ASP.NET, using normalized data, with the table rendered with exactly as many columns as are needed.
First I’m going to talk about a suboptimal way of attempting this. If you’re worked with data grids in ASP.NET before, you know that you can dynamically populate a table by binding a data table to a data grid with the AutoGenerateColumns property is set to true.
So if your data looks like this:
You can bind it to a data table and get this:
(Yes, I could have made this example about NFL quarterbacks, cars, or whatever, but why pretend to be someone that I’m not?)
The problem is, how do you get a SQL table or view like that to begin with? If your data is normalized, you may have do so something like:
select
[Person] = rc.Name,
[Race] = isnull(cv1.value, ''),
[Affiliation] = isnull(cv2.value, ''),
[Primary Weapon] = isnull(cv3.value, ''),
[Secondary Weapon] = isnull(cv4.value, ''),
[Residence] = isnull(cv5.value, '')
from
@RowCat rc
left join @CellValues cv1 on rc.Id = cv1.RowCatId and cv1.ColCatId = 1
left join @CellValues cv2 on rc.Id = cv2.RowCatId and cv2.ColCatId = 2
left join @CellValues cv3 on rc.Id = cv3.RowCatId and cv3.ColCatId = 3
left join @CellValues cv4 on rc.Id = cv4.RowCatId and cv4.ColCatId = 4
left join @CellValues cv5 on rc.Id = cv5.RowCatId and cv5.ColCatId = 5
Why is that a problem?
- It’s tedious to write.
- There’s a performance-hurting left join for each category.
- Changing the categories requires a database object update.
- Since the categories are hard-coded, the query is only good for fixed categories names and a fixed number of categories.
I created a solution that addresses all of those issues. Basically, you can keep your data normalized, using a query like this rather than the one above:
select
RowCatId = rc.id,
RowCatName = rc.Name,
ColCatName = cc.name,
CellValue = isnull(cv.Value, '')
from
@RowCat rc
cross join @ColCat cc
left join @CellValues cv on rc.Id = cv.RowCatId and cc.Id = cv.ColCatId
order by
rc.Id,
cc.Id;
Then you can take that dataset and pass it to the custom method below. Instead of populating a data grid, the method populates a table control.
''' <summary>
''' Data table should contain the columns in this enumeration. They don't have to have the same names.
''' Sort by row category id, then column category id. Every row category should have exactly one of each column category.
''' If it doesn't this method is not going to work right.
''' </summary>
''' <remarks></remarks>
Public Enum DynamicColomnDataTableConstants As Integer
''' <summary>
'''
''' </summary>
''' <remarks>must be a positive integer</remarks>
RowCategoryId
RowCategoryName
ColomnCategoryName
CellValue
End Enum
Public Shared Sub PopulateDynamicColumnTable(ByVal TheWebTable As Table, ByVal TheDataTable As DataTable)
If TheDataTable.Rows.Count = 0 Then Return
Dim HeaderRowId As Integer = TheWebTable.Rows.Add(New TableRow)
TheWebTable.Rows(0).Cells.Add(New TableCell) 'this is the empty cell in the upper-left-hand corner
Dim LastRowCategoryId As Integer = TheDataTable.Rows(0)(DynamicColomnDataTableConstants.RowCategoryId)
Dim ThisRowCategoryId As Integer = LastRowCategoryId
Dim HeaderCellId As Integer = -1
'this isn't going to loop through all rows of the table table, only enough to create all of the
'header columns
For Each dr As DataRow In TheDataTable.Rows
ThisRowCategoryId = dr(DynamicColomnDataTableConstants.RowCategoryId)
'if row category has changed, header row has been created, and this loop can be exited
If ThisRowCategoryId <> LastRowCategoryId Then Exit For
HeaderCellId = TheWebTable.Rows(HeaderRowId).Cells.Add(New TableCell)
TheWebTable.Rows(HeaderRowId).Cells(HeaderCellId).Text = dr(DynamicColomnDataTableConstants.ColomnCategoryName)
Next
LastRowCategoryId = -1
ThisRowCategoryId = TheDataTable.Rows(0)(DynamicColomnDataTableConstants.RowCategoryId)
Dim ThisRowId As Integer = -1
Dim ThisCellId As Integer = -1
Dim TheCellValue As String = String.Empty
For Each dr As DataRow In TheDataTable.Rows
ThisRowCategoryId = dr(DynamicColomnDataTableConstants.RowCategoryId)
If ThisRowCategoryId <> LastRowCategoryId Then 'row category is different, need to create new row and header cells
ThisRowId = TheWebTable.Rows.Add(New TableRow)
ThisCellId = TheWebTable.Rows(ThisRowId).Cells.Add(New TableCell)
TheWebTable.Rows(ThisRowId).Cells(ThisCellId).Text = dr(DynamicColomnDataTableConstants.RowCategoryName)
End If
ThisCellId = TheWebTable.Rows(ThisRowId).Cells.Add(New TableCell)
TheCellValue = dr(DynamicColomnDataTableConstants.CellValue)
If TheCellValue = String.Empty Then TheCellValue = " "
TheWebTable.Rows(ThisRowId).Cells(ThisCellId).Text = TheCellValue
LastRowCategoryId = ThisRowCategoryId
Next
End Sub
With that method, you’ll get a table that’s exactly the same as the data grid shown earlier in this post.
I’m sure the the same thing could be accomplished with fewer lines of code, and probably has been. But this is my solution, and it seems to work well.
Sunday, February 10, 2013
Excel as an ETL Tool
This is probably going to seem extremely basic to some people, but I think it’s a very useful technique that many SQL developers might not know about.
You can use Excel as a quick and precise ETL tool. Not for 10GB data files. Not for data that’s not already in Excel. But when you have data in Excel – especially if it’s messy – and you want to get in into SQL Server quickly, there’s a good way of doing that.
Let’s start with some messy Excel data. Here are the current NBA standings that I copied from the NBA’s website.
Now, the data isn’t hideous. It’s organized in a consistent manner. It’s just not normalized. The technique I’m going to describe works on data in all kinds of formats.
Create the Table
The first thing I’m going to do is define the table in SQL. I’m going to write the create table statement in Excel, so that my entire script will be in one place. So in cell M2, I’m going to write:
create table dbo.NbaStandings (Id integer identity,TimePeriod varchar(100), Conference varchar(50), Division varchar(50), Team varchar(50), Wins integer, Losses integer, WinPercent decimal(4, 3), GamesBack decimal(10, 2), ConfRecord varchar(10), DivRecord varchar(10), HomeRecord varchar(10), RoadRecord varchar(10), Last10 varchar(5), Streak varchar(5));
The column definitions are derived from the data, of course. That’s the easy part.
Insert the Data, Part 1
Next I’m going to create the insert statements in Excel, referencing the data in the spreadsheet. In cell M6, I’m going to write:
="insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values ("
But I'm not ready to write more just yet. SQL Server string values have to be enclosed in single quotes. Also, when you concatenate cell references in an Excel formula, you have to use double quotes. I could write the rest of the insert statement that way, but that’s going to be one cumbersome, unreadable formula. Some Excel macros will make it much easier to write.
Write the Macros (or just use mine)
Here are the macros I’m going to create in the Excel VBA editor. I recommend keeping these macros in your personal macro workbook, so you don’t have to add them to every workbook you want to use them in.
This macro will escape single quotes in in a string, and enclose a string in single quotes:
Public Function SqlEncode(TheString) As String
Public Function SqlEncode(TheString) As String
Dim ReturnValue As String
ReturnValue = TheString
ReturnValue = Replace(ReturnValue, Chr(39), Chr(39) & Chr(39)) 'escape apostrophe
ReturnValue = Chr(39) & ReturnValue & Chr(39) 'enclose in apostrophes
SqlEncode = ReturnValue
End Function
This macro will take a single cell, and apply the macro above to its value, optionally adding a comma:
Public Function SqlValue(TheRange As Range, Optional IncludeComma As Boolean = True) As String
Dim ReturnValue As String
ReturnValue = ReturnValue & SqlEncode(TheRange.Text)
If IncludeComma Then ReturnValue = ReturnValue & ", "
SqlValue = ReturnValue
End Function
And this macro will take a range of cells, apply the SqlEncode macro to all of them, and concatenate them with commas:
Public Function SqlValues(TheRange As Range) As String
Dim TheCell As Range
Dim ReturnValue As String
Dim counter As Integer
counter = 0
For Each TheCell In TheRange.Cells
If counter > 0 Then ReturnValue = ReturnValue & ", "
ReturnValue = ReturnValue & SqlEncode(TheCell.Text)
counter = counter + 1
Next
SqlValues = ReturnValue
End Function
With those macros created, I’m ready for the next step.
Insert the Data, Part 2
The first real record on the spreadsheet is on line 6, but the values in the three rows above will also be part of the record. This is where the data not being normalized makes things messy. Here’s my continued formula:
="insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values (" & SqlValue(B$3) & SqlValue(B$4) & SqlValue(B$5)
Notice that I made the row reference for those three values absolute, because I don’t want them to change when I copy the formula to new rows.
The next part of the formula uses the data on the same row my formula is on, so I can use the SqlValues macro to take care of them all:
="insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values (" & SqlValue(B$3) & SqlValue(B$4) & SqlValue(B$5) & SqlValues(B6:L6) & ");"
My formula is now complete. Here's the resulting TSQL:
insert into dbo.NbaStandings (TimePeriod, Conference, Division, Team, Wins, Losses, WinPercent, GamesBack, ConfRecord, DivRecord, HomeRecord, RoadRecord, Last10, Streak) values ('2012-2013 Division Regular Season Standings', 'Eastern Conference', 'Atlantic', 'New York', '32', '16', '0.667', '0', '20-9', '41398', '19-6', '13-10', '41458', 'W 1');
Now I can go ahead and copy the formula into all of the other data rows on the spreadsheet:
I have to change the formula’s references to the conference and division cells several times when copying the formula, because this data set isn’t normalized, as previously pointed out. I could have used an OFFSET formula in Excel to get around having to do that, but I wanted to keep this example relatively simple.
Execute the Script
The entire script to create and populate the table in now in column M of my spreadsheet. I’m going to go ahead and copy that column into new query window in SQL Server:
Once I execute the script, the data is now in a SQL Server table:
Conclusion
Given how de-normalized that data was, this wouldn’t have been possible with a standard ETL engine such as SSIS, without first normalizing the data in Excel.
At my job, I get a lot of data in a format like this or worse, so I’ve found this technique very helpful over the years. I hope others will fine it helpful as well.