The idea for this came from a friend but I liked it so much that I put it in place on my servers, too. There are several components to set up, but it will then produce email notifications to an operator when a job runs over its prescribed duration limit.
This is designed for SQL Server 2008 but I haven't bothered to look into what other versions it works with.
Set up database mail
The first step is, unfortunately, the hardest to walk through. You access the wizard in SSMS under Server > Management > Database Mail. If you already have a profile configured, by all means use it instead of creating a new one. If you have to create a new profile, make a note of the name you use and update the code in the last step accordingly. I called mine 'main'.
Configure operators
You'll need to create an operator object for each person who is to be notified by this process. Obviously, if you're already using database mail for other things, use the existing operator objects where possible.
You can configure operator objects in SSMS under Server > SQL Server Agent > Operators, or use T-SQL like this:
USE [msdb]
EXEC [msdb].[dbo].[sp_add_operator] @name = N'Dummy Operator', @enabled = 1, @email_address = N'asdf@asdf.com', @category_name = N'[Uncategorized]'
Create and populate the job-operator-threshold configuration table
This code will create the table in msdb, which I thought seemed an appropriate location. If you don't like that idea, feel free to put it in a user database. It should work just fine either way.
USE [msdb]
CREATE TABLE [dbo].[local_JobRuntimeAlerts] (
[job_id] UNIQUEIDENTIFIER NOT NULL, -- Foreign key to sysjobs.job_id
[runtime_threshold] INT NOT NULL, -- In HHMMSS format (ugh)
[operator] SYSNAME NOT NULL
)
Optionally, you can populate it initially with this:
INSERT INTO [local_JobRuntimeAlerts] ([job_id], [runtime_threshold], [operator])
SELECT [job_id], 1000, 'Dummy Operator' from [msdb].[dbo].[sysjobs]
…but replace 'Dummy Operator' with your operator name.
I hate the way they did the job duration, but it's what we have to work with. It's an integer taking this form: (Hours * 10000) + (Minutes * 100) + Seconds. Just be mindful of your thresholds.
Notice that there are no keys or uniqueness constraints on this table. This allows you to set multiple alerts per job_id, if you want — even to the same operator. I'm not sure why you'd want to have two completely identical alerts, but you can and it'll work.
If you want some automated help in determining good thresholds, here's a quick query that'll make recommendations based on historical runtimes. It doesn't discount outlier data, though, so take the results with a grain of salt.
SELECT
[job_id],
[name],
([average_duration] % 60) + ([average_duration] / 60 % 60 * 100) + ([average_duration] / 3600 * 10000) [average_duration],
([stdev_duration] % 60) + ([stdev_duration] / 60 % 60 * 100) + ([stdev_duration] / 3600 * 10000) [stdev_duration],
([recommended_threshold] % 60) + ([recommended_threshold] / 60 % 60 * 100) + ([recommended_threshold] / 3600 * 10000) [recommended_threshold]
FROM (SELECT
[job_id],
[name],
CONVERT(INT,AVG([duration])) [average_duration],
CONVERT(INT,STDEV([duration])) [stdev_duration],
CONVERT(INT,AVG([duration]) + 2 * STDEV([duration])) [recommended_threshold]
FROM (SELECT
[sysjobs].[job_id],
[sysjobs].[name],
([sysjobhistory].[run_duration] % 100) + ([sysjobhistory].[run_duration] / 100 % 100 * 60) + ([sysjobhistory].[run_duration] / 10000 * 60 * 60) [duration]
FROM [dbo].[sysjobs]
LEFT JOIN [dbo].[sysjobhistory] ON [sysjobhistory].[job_id] = [sysjobs].[job_id]
WHERE [sysjobhistory].[step_id] = 0 -- Step 0 is the summary of all steps
) AS [data]
GROUP BY [job_id], [name]
) AS [raw]
Create the job to generate notifications
Here is some T-SQL to create and configure the SQL agent job that will run the alerts. It is configured to run daily at 6am and is called "Generate Runtime Alerts"
USE [msdb]
GO
/****** Object: Job [Generate Runtime Alerts] Script Date: 08/07/2014 09:25:45 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 08/07/2014 09:25:45 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'Generate Runtime Alerts',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'Courtesy of Jonathan Overholt (http://jonathan.overholt.org/projects/sql-agent-duration-alert)',
@category_name=N'[Uncategorized (Local)]',
@job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [Make Alerts] Script Date: 08/07/2014 09:25:45 ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Make Alerts',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'DECLARE JobAlertCurs CURSOR FOR
SELECT
[sysjobs].[name] [JobName],
[Alert].[operator] [Operator],
[sysjobhistory].[run_date] [Run_Date],
[sysjobhistory].[run_time] [Run_Time],
[sysjobhistory].[run_duration] [Duration],
[Alert].[runtime_threshold] [Threshold]
FROM [msdb].[dbo].[local_JobRuntimeAlerts] AS [Alert]
INNER JOIN [msdb].[dbo].[sysjobhistory]
ON [Alert].[job_id] = [sysjobhistory].[job_id]
INNER JOIN [msdb].[dbo].[sysjobs]
ON [sysjobs].[job_id] = [sysjobhistory].[job_id]
WHERE [step_id] = 0
AND [sysjobhistory].[run_duration] >= [Alert].[runtime_threshold]
-- Filter the job history to only the most recent execution for each job
-- If you feel spunky, you can set this to filter by job age or other criteria
AND [sysjobhistory].[instance_id] =
(SELECT TOP 1 [instance_id] FROM [msdb].[dbo].[sysjobhistory] WHERE [sysjobhistory].[job_id] = [Alert].[job_id] ORDER BY [instance_id] DESC)
OPEN JobAlertCurs
DECLARE @JobName VARCHAR(MAX)
DECLARE @Operator SYSNAME
DECLARE @RunDate INT
DECLARE @RunTime INT
DECLARE @Duration INT
DECLARE @Threshold INT
DECLARE @MailProfileToSendVia SYSNAME = ''main''
DECLARE @Subject VARCHAR(MAX) = ''[Warning] Long running SQL job on '' + @@SERVERNAME
DECLARE @Body VARCHAR(MAX)
DECLARE @DurationFmt VARCHAR(10)
DECLARE @ThresholdFmt VARCHAR(10)
FETCH NEXT FROM JobAlertCurs INTO @JobName, @Operator, @RunDate, @RunTime, @Duration, @Threshold
WHILE @@FETCH_STATUS = 0
BEGIN
-- This can produce ugly times (e.g. 0:10:0) but, again, I don''t care enough to fix it
SET @DurationFmt = CONVERT(VARCHAR(2),@Duration / 10000)+'':''+CONVERT(VARCHAR(2),@Duration / 100 % 100)+'':''+CONVERT(VARCHAR(2),@Duration % 100)
SET @ThresholdFmt = CONVERT(VARCHAR(2),@Threshold / 10000)+'':''+CONVERT(VARCHAR(2),@Threshold / 100 % 100)+'':''+CONVERT(VARCHAR(2),@Threshold % 100)
SET @Body = ''Job "''+@JobName+''" ran on ''+CONVERT(VARCHAR(10),@RunDate)+'' at ''+CONVERT(VARCHAR(10),@RunTime)+'' for ''+@DurationFmt+'', which is over the prescribed threshold of ''+@ThresholdFmt+''.''
EXEC [msdb].[dbo].[sp_notify_operator]
@profile_name = @MailProfileToSendVia,
@name = @Operator,
@subject = @Subject,
@body = @Body
FETCH NEXT FROM JobAlertCurs INTO @JobName, @Operator, @RunDate, @RunTime, @Duration, @Threshold
END
CLOSE JobAlertCurs
DEALLOCATE JobAlertCurs',
@database_name=N'msdb',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Daily',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=1,
@freq_subday_interval=0,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20140807,
@active_end_date=99991231,
@active_start_time=60000,
@active_end_time=235959,
@schedule_uid=N'bc1a8fb6-e0d2-4ada-9d75-31930d9d2664'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO
Now you can open the job like normal and modify the step code. You'll need to change @MailProfileToSendVia to be the same as the Database Mail Profile you configured earlier. In my case, it's called 'main'.
Wrap-up
The setup is done, so now just sit back and wait for the emails. I initially set up all jobs to notify myself and set a very low threshold so that I knew I would get an email, then manually kicked off the job.
If you have comments you can send them to me at jonathan@overholt.org or on Google+. I hope this is helpful!