Nucleus Datakonsult Agent Alerts Management Pack

Home

About me and Nucleus
Partners

Articles, tips etc.
My blog
Links
Microsoft articles
Forums

Courses and training




Microsoft Most Valuable Professional


Overview
SQL Server writes messages to the EventLog when various things happen in the database engine. You can have SQL Server Agent to monitor the EventLog for some specific SQL Server errors, and let Agent can notify an operator (send e-mail) and/or start a job. The problem has always been to know what alerts to define - this article is an attempt to rectify that.

You might already have some monitoring application that has support for SQL Server, like OP5 ,HP Operations Manager or Microsoft Operations Manager. If you do and are happy with that solution, then stick with it. But if you don't have such a solution, or if you feel that your current software's support for SQL Server EventLog monitoring can be improved, then read on.

Acknowledgements
Thanks to:
Stefan K., message 833

Versions etc.
This article applies to SQL Server 2005 and later, where not noted otherwise.
Article written: 2009-05-29
Modified
2011-09-15: Cleaned up T-SQL code so you can execute all in one go
2012-07-03: Added message 833

Intended audience
I assume a reasonable knowledge level of SQL Server. You should be able to find your way around the tools and understand the most common terms and acronyms. If you don't meet those criteria, then I suggest you buy some tool which has an interface which is easier to use, hand this over to a senior DBA or perhaps hire a consultant to do the planning and implementation. If you want to use the T-SQL script at the end of this article, then you should be able to read and have a grasp of what it does.

The e-mail part
You probably want to send an email when the event occurs. I suggest you use the Database Mail support which was introduced in SQL Server 2005 - a pure SMTP implementation. I discourage you from using the old "SQL Mail" support since it uses MAPI. SQL Server Books Online describes how to configure Database Mail, so I won't talk about it here. Check out Dan Guzman's script to configure Database Mail using T-SQL instead of running the Wizard on each SQL Server Instance.

Frequency of EventLog polling
You can control this using a registry setting. Do this at own risk, it isn't supported or documented. The name of the registry key is EventLogPeekInterval. Where to find it depends on things like whether it is a default or a named instance, the instance name and also version of SQL Server. To put it bluntly, if you can't find it, then you shouldn't mess with it. The default value is every 20 seconds.

Defining the alert
You define an alert using the procedure msdb.dbo.sp_add_alert or SSMS (SQL Server Agent, Alerts folder). If you are about to use the T-SQL script and sp_add_alert, then please do read about the procedure in Books Online, and decide what values you want for each parameter. In the script below you will find what values I use, but that might not be the right values for you. I won't discuss the parameters here since they are documented in Books Online.

You also want to say what action to be performed. I will assume that you want to send an email (msdb.dbo.sp_add_notification). This requires that you configured the Database Mail support, configured Agent to use the Profile you created, re-started Agent, and finally configured an Operator. You will see that in the script below, I hard-wired the operator name to 'SQLAdmins'. Change this to your liking.

For the alert, you say whether you want to trigger on a certain error number or all errors with a certain severity level (I'm ignoring Perfmon and WMI alerts here). You can't have both. The difficult part is deciding on what to trigger on. You don't want to go too wide so you will be mail-bombed - but you don't want to miss important messages either.

One thing you will notice that if you trigger for a certain severity level that there are messages with that level that you may want to exclude. We used to be able to enter such "Non Alertable Errors" in the registry key NonAlertableErrors (as described here). This functionality seems to have been removed, I'm afraid. Let me know if your tests show that the reg key actually does anything in 2005 or later.

How will this evolve and how can you help?
In a blog post I wrote 2009-02-23, I had hopes for some community project to get started for this. That didn't happen, so I decided to write this article. I don't expect that I managed to find the "right" messages. If you have some messages you find obvious should be there or vice versa, then let me know. For instance, there might be some error which always follow some particular other error - meaning it isn't necessary to define alerts for both. Just keep in mind that we cannot exclude certain errors if we included the whole severity level for that error. I expect that many of us will end up with some differences in their alert definitions - one size doesn't fit all. But I hope that the suggested definitions on this page can serve as a good starting point. Please send me an email if you find obvious things to add or remove from the list. If I use your recommendation, then I'll add your name (and URL if you provide) to the Acknowledgements section, unless you ask otherwise. You find my mail address here.

My suggested alerts
Some readers don't feel confident with adapting and running the TSQL code. Instead, they prefer to have the definitions listed and add the alerts using point&click in SSMS. Below you find my suggestions and further down you have the T-SQL code.

The alerts defined for the whole severity level:
Suggested alert nameSeverity level
Severity level 1616
Severity level 1717
Severity level 1818
Severity level 1919
Severity level 2020
Severity level 2121
Severity level 2222
Severity level 2323
Severity level 2424
Severity level 2525

The alerts defined for specific error numbers:
Sugg. alert nameSeverityErr noN.e.Description
Error 0060112601yCould not continue scan with NOLOCK due to data movement.
Error 0067410674nException occurred in destructor of RowsetNewSS 0x%p. This error may indicate a problem related to releasing pre-allocated disk blocks used during bulk-insert operations.
Error 0070810708nServer is running low on virtual address space or machine is running low on virtual memory. Reserved memory used %d times since startup. Cancel query and re-run, decrease
Error 0080610806naudit failure (a page read from disk failed to pass basic integrity checks)
Error 0082510825nA read of the file '%ls' at offset %#016I64x succeeded after failing %d time(s) with error: %ls. Additional messages in the SQL Server error log and system event log may
Error 0083310833ySQL Server has encountered %d occurrence(s) of I/O requests taking longer than %d seconds to complete on file [%ls] in database [%ls] (%d)...
Error 0097310973nDatabase %ls was started . However, FILESTREAM is not compatible with the READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION options. Either remove the FILESTREAM fi
Error 01205131205yTransaction (Process ID %d) was deadlocked on %.*ls resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Error 03401103401nErrors occurred during recovery while rolling back a transaction. The transaction was deferred. Restore the bad page or file, and re-run recovery.
Error 03410103410nData in filegroup %s is offline, and deferred transactions exist. Use RESTORE to recover the filegroup, or drop the filegroup if you never intend to recover it. Log trunc
Error 03414103414nAn error occurred during recovery, preventing the database '%.*ls' (database ID %d) from restarting. Diagnose the recovery errors and fix them, or restore from a known go
Error 03422103422nDatabase %ls was shutdown due to error %d in routine '%hs'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted.
Error 03452103452nRecovery of database '%.*ls' (%d) detected possible identity value inconsistency in table ID %d. Run DBCC CHECKIDENT ('%.*ls').
Error 03619103619nCould not write a checkpoint record in database ID %d because the log is out of space. Contact the database administrator to truncate the log or allocate more space to th
Error 03620103620nAutomatic checkpointing is disabled in database '%.*ls' because the log is out of space. Automatic checkpointing will be enabled when the database owner successfully chec
Error 03959103959nVersion store is full. New version(s) could not be added. A transaction that needs to access the version store may be rolled back. Please refer to BOL on how to configure
Error 05029105029nWarning: The log for database '%.*ls' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the
Error 05144105144nAutogrow of file '%.*ls' in database '%.*ls' was cancelled by user or timed out after %d milliseconds. Use ALTER DATABASE to set a smaller FILEGROWTH value for this file
Error 05145105145nAutogrow of file '%.*ls' in database '%.*ls' took %d milliseconds. Consider using ALTER DATABASE to set a smaller FILEGROWTH for this file.
Error 05182105182nNew log file '%.*ls' was created.
Error 08539108539nThe distributed transaction with UOW %ls was forced to commit. MS DTC became temporarily unavailable and forced heuristic resolution of the transaction. This is an inform
Error 08540108540nThe distributed transaction with UOW %ls was forced to rollback.
Error 09001109001nThe log for database '%.*ls' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
Error 141571014157nThe subscription created by Subscriber '%s' to publication '%s' has expired and has been dropped.
Error 141611014161nThe threshold [%s:%s] for the publication [%s] has been set. Make sure that the logreader and distribution agents are running and can match the latency requirement.
Error 171731017173nIgnoring trace flag %d specified during startup. It is either an invalid trace flag or a trace flag that cannot be specified during server startup.
Error 171791017179nCould not use Address Windowing Extensions because the 'lock pages in memory' privilege was not granted.
Error 178831017883nProcess %ld:%ld:%ld (0x%lx) Worker 0x%p appears to be non-yielding on Scheduler %ld. Thread creation time: %I64d. Approx Thread CPU Used: kernel %I64d ms, user %I64d ms.
Error 178841017884nNew queries assigned to process on Node %d have not been picked up by a worker thread in the last %d seconds. Blocking or long-running queries can contribute to this cond
Error 178871017887nIO Completion Listener (0x%lx) Worker 0x%p appears to be non-yielding on Node %ld. Approx CPU Used: kernel %I64d ms, user %I64d ms, Interval: %I64d.
Error 178881017888nAll schedulers on Node %d appear deadlocked due to a large number of worker threads waiting on %ls. Process Utilization %d%%.
Error 178901017890nA significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: %d seconds. Working set (KB): %I64d, committed
Error 178911017891nResource Monitor (0x%lx) Worker 0x%p appears to be non-yielding on Node %ld. Memory freed: %I64d KB. Approx CPU Used: kernel %I64d ms, user %I64d ms, Interval: %I64d.
Error 184011418401nLogin failed for user '%.*ls'. Reason: Server is in script upgrade mode. Only administrator can connect at this time.%.*ls
Error 205721020572nSubscriber '%s' subscription to article '%s' in publication '%s' has been reinitialized after a validation failure.
Error 205741020574nSubscriber '%s' subscription to article '%s' in publication '%s' failed data validation.
"N.e." means that you need to enable writing to eventlog for this error (it isn't written to eventlog by default) using sp_altermessage. This requires sp3 for 2005 and sp1 for 2008.

About the T-SQL code
If you don't feel comfortable with reading, and using the code below, then don't. I will not reply to email about how the code works, how to use it, where to execute it etc. Hire a consultant or hand this over to somebody with more T-SQL experience if you feel unsecure.

First we create a procedure that acts as a wrapper around sp_add_alert and sp_add_notification. This makes the code for each alert definition a little shorter.

Then we define the alerts. First the severity levels. I decided to go all the way down to 16, because there are a lot of errors with 16 which I want to be notified for and I didn't want to add each and every one of those individually. I wish we could block out some, but NonAlertableError doesn't seem to work anymore. Then the error numbers within each severity level. Note that I also added a couple that do not by default go to the event log. If you want to be alerted for such errors, you first need to configure the errors to be written to the EventLog. This functionality was added in sp3 for SQL Server 2005 and sp1 for SQL Server 2008 (blogged here).

The code
You find the code here.