An Example of Using LEAD and LAG Functions from SQL Server 2012

I recently encountered a technical interview question and it was something like this:

A company has its door access logging system and the company policy dictates that every employee’s card-in/card-out records shall reflect his/her true presence in or absence from office. But somehow, some employees skip cardings or tailgate behind others. Therefore, the logging system may have messy records with multiple “card-in’s” before one “card-out” or records that cannot be paired. Your task is to write a script to calculate the exact amount of time that the employee is in office.

As I’ve dwelled too long on the database administration side, I actually had no freaking clue how to write an effcient script without resorting to the RBAR appoach, so I acknowledged my defeat presented my hnoest thoughts and admitted that I would need Google for a lot more help šŸ™‚

Now I am back at my shabby cozy little cubicle from my potential employer’s shiny high-end downtown office building, I did start Googling and found out the two newly-introduced functions LEAD and LAG from SQL Server 2012 might just be the solution that I was looking for.

First of all, for the sake of explanation, I fabricate some fake entry logs with courtesy again to two of the most prominent NBA’s superstars:

USE TestDB
GO

CREATE TABLE CardingLogs
(
	CardingTime DATETIME, --time holder get carded by the system
	Name VARCHAR(50), --name of the card holder
	InOut VARCHAR(4) --with 'in' meaning card-in and 'out' meaning card-out
)
GO

INSERT CardingLogs VALUES
('2014-03-30T07:30:00', 'Lebron James', 'in'),
('2014-03-30T07:45:00', 'Lebron James', 'in'),
('2014-03-30T08:00:00', 'Lebron James', 'in'),
('2014-03-30T09:00:00', 'Lebron James', 'out'),
('2014-03-30T10:00:00', 'Lebron James', 'out'),
('2014-03-30T11:00:00', 'Lebron James', 'out'),
('2014-03-30T12:00:00', 'Lebron James', 'out'),
('2014-03-30T14:00:00', 'Lebron James', 'in'),
('2014-03-30T15:00:00', 'Lebron James', 'in'),
('2014-03-30T16:00:00', 'Lebron James', 'out'),
('2014-03-30T17:00:00', 'Lebron James', 'in')
GO

INSERT CardingLogs VALUES
('2014-03-30T07:30:00', 'Dwyane Wade', 'out'),
('2014-03-30T07:45:00', 'Dwyane Wade', 'in'),
('2014-03-30T08:00:00', 'Dwyane Wade', 'out'),
('2014-03-30T09:00:00', 'Dwyane Wade', 'in'),
('2014-03-30T10:00:00', 'Dwyane Wade', 'out'),
('2014-03-30T11:00:00', 'Dwyane Wade', 'in'),
('2014-03-30T12:00:00', 'Dwyane Wade', 'out'),
('2014-03-30T14:00:00', 'Dwyane Wade', 'out'),
('2014-03-30T15:00:00', 'Dwyane Wade', 'in'),
('2014-03-30T16:00:00', 'Dwyane Wade', 'out'),
('2014-03-30T17:00:00', 'Dwyane Wade', 'out')
GO

And then if you’d like to see what’s inside the table with SELECT * FROM CardingLogs, you’ll get the below output:

As illustrated by the red arrow, the programming logic is to find if the current ‘out’ record is ‘leading’ an immediate ‘in’ record behind; or put it another way, if the current ‘in’ is ‘lagged’ by an immediate ‘out’ record in front of itself. And hence the function LEAD() and LAG() come in handy. My approach would be use the lag function to mark all the ‘out’ rows (in green) which can be described as the first ‘out’ after an ‘in’ (or a series of ‘in’s’). Below query

SELECT *,
IsFO = 
CASE
	WHEN (InOut = 'out' AND LAG(InOut, 1, 0) OVER (PARTITION BY Name ORDER BY CardingTime ASC) = 'in') THEN 1
	ELSE 0
END
FROM CardingLogs

shows the “marked” rows as follows:
MarkedRows1

Now what we need to do is to calculate the time diffierences between all rows marked with ‘1’ and their immediate preceeding rows marked with ‘0’ as illustrated by the green box in the above resultset. And which function can you think of that can do this job with a breeze? You betcha it is the LAG() again! And the final script with a little help from our buddy common table expression is like this:

WITH TMP AS
(
SELECT DATEDIFF(MINUTE, LAG(CardingTime, 1, 0) OVER(PARTITION BY Name ORDER BY CardingTime ASC), CardingTime) AS TimeInOffice,
*,
IsFO = 
CASE
	WHEN (InOut = 'out' AND LAG(InOut, 1, 0) OVER (PARTITION BY Name ORDER BY CardingTime ASC) = 'in') THEN 1
	ELSE 0
END
FROM CardingLogs
)
SELECT Name, SUM(TimeInOffice) AS TotalTimeInOffice FROM tmp
WHERE IsFO = 1
GROUP BY Name

The final resultset:

CardingResultSet

And if you’d like a better understanding of LEAD and LAG function, do not hesitate to consult Microsoft’s official explanation of them here.

Advertisements

About Nick Xu

Not a DBA anymore
This entry was posted in Common Table Expression, T-SQL Script and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s