Mentoring

Extracurricular Learning

Being a lifelong learner, I believe that what I call “Extracurricular Learning” is very important. I define that as any learning that you do outside of your job or school either for fun or to better yourself due to your own drive or passion. Maybe you like to use books, or screencasts, or conferences, but you actively make learning your own responsibility.

Some developers that I know or have run across are really into JIT (just-in-time) learning. They fly by the seat of their pants and, as Jeff Atwood puts it, “page fault” in knowledge. I am by no means against learning things as you go along. If you don’t do that, you are probably a pretty terrible developer. However, I think that if you don’t do some preparation in advance, you won’t be as good at “page faulting” your knowledge as you could be.

Let’s imagine that I’m working in Ruby for the first time. I may google ruby loop or something to figure the syntax of how to do a loop. Or, let’s say I need to do some task X. What I’m likely to do as a .Net programmer is to think of how I’d do it in C# and then google for the way to do X-C#-Thing in Ruby.

What would be better is if I had immersed myself in this technology ahead of time. I don’t mean become an expert before you begin, I am talking about doing due diligence before getting involved. If I have read Why’s Poignant Guide to Ruby and watched a few TekPub videos about Ruby and attended a few Columbus Ruby Brigade meetings, then I would have a very broad overview of how to “do” Ruby. And while I wouldn’t be an expert, it would make my google-fu so much better because my searches could be targeted toward getting me the *right* information.

As I’m getting more and more into Asp.Net MVC 2 for a project at work, I find the “leg work” that I did watching the TekPub videos and reading blogs to be invaluable. If I hadn’t done the Extracurricular Learning, I probably would have attempted a lot of WebForms inspired nonsense instead of finding out about how much “magic” is available to me via the built-in framework. Much like Rails, I can get Asp.Net MVC to do a lot of work for me (especially when combined with jQuery) if I just know how to rely on the scaffolding/convention over configuration inherent in the product.

I’m not wrong on this. If you want to be a great developer, you need to be involved in Extracurricular Learning. If you aren’t, you will stagnate and be guilty of “writing FORTRAN in any language”.

Training

TekPub

Almost two years ago, in a rant titled Y Kant Developers Read?, I lamented the fact that I’m finding that less and less of my peers are reading books to gain knowledge. In the last two years, that has gotten even worse. Several programmer’s magazines/journal publishers have gone bankrupt and the book publishing industry is in the crapper. Programmers are either getting information from blogs, user groups, or conferences. Unfortunately, I fear that for the majority of programmers none of that is true. Interviewing programmers, I’ve found that most of them truly can’t code FizzBuzz or a singleton or any other fairly simple problem that I put in front of them.

Entering into the programmer learning void recently have been screencasts. I’ve long been a fan of DimeCasts (Link Removed) as a way to get introduced to a lot of different topics. I learned about (read: finally understood) Dependency Injection / Inversion of Control from DimeCasts and I think they are good people. Additionally, one way that a lot of people learned about ASP.Net MVC was from Rob Conery’s screencasts building the MVC storefront on ASP.Net (apparently – according to Rob – they’ve since been moved).

TekPub Logo
Enter TekPub. Rob Conery and James Avery started a company to produce high quality screencasts to teach programmers about topics like NHibernate with Ayende, Git, jQuery, ASP.Net MVC, Ruby on Rails, Linux, LINQ, and more. I talked my boss into buying me a year’s subscription and I am EXTREMELY pleased with how he spent the money 😉 They offer streaming video as well as downloads in standard and iPhone format. I’ve watched probably at least 15 screencasts so far and I have learned a LOT. I keep several videos on my phone and whenever I have spare time, I watch a few minutes.

Steve Sanderson’s MVC 2 screencasts are exceptionally money. I wish I had the words to explain the kinds of lights of understanding that went off in my head after watching these videos. They alone are worth the price of admission. But, they aren’t alone… I got a lot out of Justin Etheredge’s LINQ series as well as Rob’s “Build Your Own Blog” stuff. If your employer has any kind of training budget, you should get them to drop the $200.00 yearly subscription fee for you. Neither of you will regret it. In fact, even if they don’t, you should truly consider investing the $200.00 in your own future.

Code Tips

Row_Number() Part Two

Last Time, we looked at ROW_NUMBER() and saw a little of how it works and how we could use it to remove duplicates. Another one of my favorite uses for ROW_NUMBER() is to get only one row from a one-to-many join. I’ve had situations at work where I need to return one row per Person, but each person had several addresses, several injury codes, and several contacts (people working their case).

In many of those cases, there is nothing to indicate which one of those rows should receive precidence over the others. One might just select from those tables and then take just the MIN() of the primary key, grouping by all other fields, but that makes for very messy and hard to maintain code in my opinion. ROW_NUMBER() gives us a much easier way to handle this problem.

Examine this sql script that gives us a Person Table and a PersonAddress table. One person can have many addresses, but there is no way in the schema to give precidence to one of the addresses over any of the others.

USE tempdb
GO

CREATE TABLE Person (PersonId INT IDENTITY(1,1) PRIMARY KEY, Name VARCHAR(50))
GO

INSERT Person ([Name])
SELECT 'Bill'
UNION
SELECT 'Bob'
UNION
SELECT 'Carl'
UNION
SELECT 'Sue'
UNION
SELECT 'Cleo'
UNION
SELECT 'Nick'
UNION
SELECT 'Thomas'
UNION
SELECT 'Jeydro'
UNION
SELECT 'Ronald'
UNION
SELECT 'Jamie'
UNION
SELECT 'Colleen'
UNION
SELECT 'Denise'
UNION
SELECT 'Phil'
UNION
SELECT 'Jim'
UNION
SELECT 'Jack'
UNION
SELECT 'Kathleen'
GO

CREATE TABLE PersonAddress (PersonAddressId INT IDENTITY(1,1) PRIMARY KEY, PersonId INT, [Address] VARCHAR(100))
GO

INSERT PersonAddress (PersonId, [Address]) 
SELECT PersonId, CONVERT(VARCHAR, PersonId * 123) + ' ' + UPPER(REVERSE([Name])) + ' ST.'
FROM Person
ORDER BY PersonId DESC
GO

INSERT PersonAddress (PersonId, [Address]) 
SELECT PersonId, CONVERT(VARCHAR, PersonId * 321) + ' SQL DR.'
FROM Person
WHERE PersonId BETWEEN 3 AND 9
GO

SELECT p.PersonId, p.[Name], pa.[Address]
FROM Person p
INNER JOIN PersonAddress pa
ON p.PersonId = pa.PersonId
ORDER BY p.PersonId
GO

DROP TABLE Person 
GO
DROP TABLE PersonAddress
GO

Here is a sample of the results:
One to Many Addresses

Using a similar trick to what we used last time to find unique records, we can alter our select statement so that we only return one address per person. If we change our select statement to this

; WITH Addresses
AS 
(
	SELECT p.PersonId, p.[Name], pa.[Address], ROW_NUMBER() OVER (PARTITION BY p.PersonId ORDER BY p.PersonId ASC) AS RowNum  
	FROM Person p
	INNER JOIN PersonAddress pa
	ON p.PersonId = pa.PersonId
)
SELECT PersonId, [Name], [Address], RowNum
FROM Addresses
GO

We get the following results:
One to Many Addresses With Row Numbers

You can see now that all we have to do is limit the result where the RowNum is equal to one and we have something that we can use (the CTE) in further joins or in reports to ensure that we don’t introduce duplicates to our query.

; WITH Addresses
AS 
(
	SELECT p.PersonId, p.[Name], pa.[Address], ROW_NUMBER() OVER (PARTITION BY p.PersonId ORDER BY p.PersonId ASC) AS RowNum  
	FROM Person p
	INNER JOIN PersonAddress pa
	ON p.PersonId = pa.PersonId
)
SELECT PersonId, [Name], [Address]
FROM Addresses
WHERE RowNum = 1
GO

Our results:
One to One Addresses Thanks to RowNumber

Code Tips

Row_Number() Part One

Sometimes when you are performing queries, you have data that does not have a primary key and contains duplicate records. If it had had a primary key, you could group on the columns that constitute a duplicate and just keep the MIN or MAX of the primary key, deleting the others. However, lacking that primary key, you have no way of uniquely identifying a row. Additionally, UNIQUEIDENTIFIER types don’t work with MAX and MIN, so you’d have a hard time picking a row in that case as well.

One option is of course to add an identity column to the table and do the operation that I described. However, sometimes you don’t have the appropriate permissions or just aren’t allowed to modify the table to do that. You can use the ROW_NUMBER() function to assign a unique row number to each row by partition. I’ll talk a little more about partition later, but when not explicitly specified in the query, the partition refers to the entire table.

Examine the script below. (All Sql Code contained here will only work on Sql Server 2005 and later)

USE tempdb
GO

CREATE TABLE HasDupes (ProductName VARCHAR(50), UpcCode VARCHAR(10))
GO

INSERT HasDupes (ProductName, UpcCode)
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Diapers', '0123456780'
UNION ALL
SELECT 'Formula', '0987654320'
UNION ALL
SELECT 'Formula', '0987654320'
UNION ALL
SELECT 'Formula', '0987654320'
UNION ALL
SELECT 'Diet Coke', '0564534230'
UNION ALL
SELECT 'Chair', '0872372530'
GO

SELECT *
FROM HasDupes
GO

DROP TABLE HasDupes
GO

You get the results shown here:
Initial duplicates.

If we want to add a rownumber to the table, we would alter the part of the query that is

SELECT *
FROM HasDupes
GO

to say this

SELECT ProductName, UpcCode, ROW_NUMBER() OVER (ORDER BY ProductName ASC) AS RowNum
FROM HasDupes
GO

and get these results
Initial row number.

That helps some, but wouldn’t it be better if we could get a row number assigned within groups? That is what the PARTITION BY clause does. Change the SELECT sql again so that it now looks like this (note: you can PARTITION BY and ORDER BY any columns you want, and they can be the same or different from each other)

SELECT ProductName, UpcCode, ROW_NUMBER() OVER (PARTITION BY UpcCode ORDER BY ProductName ASC) AS RowNum
FROM HasDupes
GO

Now, we get this
Row Number with Partition

Okay, now all we have to do is keep the records that get assigned a one, forsaking all others. Change that same section of code again to read like this and then run the script. Note, I am using the CTE syntax here.

WITH ProductDupes
AS
(
	SELECT ProductName, UpcCode, ROW_NUMBER() OVER (PARTITION BY UpcCode ORDER BY ProductName ASC) AS RowNum
	FROM HasDupes
)

DELETE FROM ProductDupes WHERE RowNum > 1
GO

SELECT *
FROM HasDupes
GO

Now we have a table with no duplicates.
No more duplicates

That’s it, that is all there is to it. In my next blog, I will cover how we can use this trick in joins to ensure that we only get a 1 to 1 join, even when the data is 1 to Many.

Code Tips

Common Table Expressions

CTE (Common Table Expression) syntax can be a little tricky at first, but once you have a little primer, you will love using them as much as I do. Examine the following code below that uses a derived table from which to select. (Please realize that this is a contrived example and you could of course just join the tables in the query directly.)

USE tempdb
GO

SELECT t.TableName, c.ColumnName
FROM (SELECT OBJECT_ID AS Id, [Name] AS TableName FROM sys.tables) t
INNER JOIN (SELECT OBJECT_ID AS Id, [Name] AS ColumnName FROM sys.columns) c ON t.Id = c.Id

The query between the parentheses creates a table structure (that I aliased as “c” and “t” for my two) that only lasts for the life of the query. I can then reference those tables in the query in any way that I would a regular table. However, that syntax can make it very difficult to read, especially in long queries with several complicated derived tables.

Sql Server 2005 introduced a new syntax to be used instead called Common Table Expressions (CTEs). Using a CTE, the above query would be written like this

USE tempdb
GO

WITH Tbls AS
(
	SELECT OBJECT_ID AS Id, [Name] AS TableName
	FROM sys.tables
),
Clmns AS
(
	SELECT OBJECT_ID AS Id, [Name] AS ColumnName
	FROM sys.columns
)

SELECT t.TableName, c.ColumnName
FROM Tbls t
INNER JOIN Clmns c ON t.Id = c.Id

Alternatively, you can also explictly declare you column names in the CTE definition and not have to alias the columns in the query itself.

USE tempdb
GO

WITH Tbls (Id, TableName) AS
(
	SELECT OBJECT_ID, [Name]
	FROM sys.tables
),
Clmns (Id, ColumnName) AS
(
	SELECT OBJECT_ID, [Name]
	FROM sys.columns
)

SELECT t.TableName, c.ColumnName
FROM Tbls t
INNER JOIN Clmns c ON t.Id = c.Id

One “gotcha” is that the CTE – beginning with the WITH keyword – must be the first statement that is encountered. All previous statements above it must be terminated with a GO or a semicolon. If you ran the following

USE tempdb
GO

SELECT 'Here Comes the Error'

WITH Tbls (Id, TableName) AS
(
	SELECT OBJECT_ID, [Name]
	FROM sys.tables
),
Clmns (Id, ColumnName) AS
(
	SELECT OBJECT_ID, [Name]
	FROM sys.columns
)

SELECT t.TableName, c.ColumnName
FROM Tbls t
INNER JOIN Clmns c ON t.Id = c.Id

You’d get the errors shown below (or something very close)

Msg 102, Level 15, State 1, Line 4
Incorrect syntax near 'Tbls'.
Msg 102, Level 15, State 1, Line 8
Incorrect syntax near ','.

Some people even just make a habit of actually prefixing the WITH with a semicolon every time to make sure that everything goes off without a hitch. If I do run this script instead there are no issues.

USE tempdb
GO

SELECT 'No Problems Here'

;WITH Tbls (Id, TableName) AS
(
	SELECT OBJECT_ID, [Name]
	FROM sys.tables
),
Clmns (Id, ColumnName) AS
(
	SELECT OBJECT_ID, [Name]
	FROM sys.columns
)

SELECT t.TableName, c.ColumnName
FROM Tbls t
INNER JOIN Clmns c ON t.Id = c.Id

This is just a quick primer on CTEs, but I hope that you can find it useful. I really find that it cleans my code up quite a bit.