If you manage SCOM, you are all too familiar with the lack of functionality to easily schedule maintenance mode with a future start time. While there are several good PowerShell scripts available a-la Google to facilitate some semblance of scheduling, I’ve always wondered why MS did not include this functionality into the base product. As many SCOM admins know, the platform can get fairly complex pretty quickly as management packs are added, written, customized, etc. This complexity makes customizing a scheduling tool for maintenance even more fun when we start to step outside the good old Microsoft.Windows.Computer class.

Veeam has a pretty slick management pack to monitor both VMWare and Hyper-V within SCOM. I work in mainly a VMWare shop so I’ll focus on that. While this integration is super cool when we go to set up alerting, dive into the dashboards, and run the reports, it adds an additional layer to contend with if we want to schedule maintenance. This main hurdle to overcome is that while putting the ESX host into MM via the VMWare class takes care of the VM’s from Veeam’s perspective, these VM’s may also be running a Windows Agent which is NOT sub-classed within the Veeam.Virt.Extensions.VMware class. If you do not use the Windows Agents in favor of the monitoring solely through Veeam, then you don’t have anything to worry about; however, if you do still leverage the Windows Agent, then you have an additional class to worry about with maintenance mode. What? You also run Linux VM’s??? Oh, dear!

I was actually thinking closer to “Oh, s**t!” when I starting trying to bend some scripts around the maintenance scheduling because you do need to consider all possible VM OS types. As I needed to provide a way for application teams to put their own servers into maintenance, PowerShell scripts didn’t really work since not everyone working on the application is a PS wizard so I created a web application that allows them to manage their maintenance mode. The backend is C Sharp and although I did something similar several years ago in SCOM 2007, I found that 2012 was a bit easier as we don’t have to explicitly contend with the Health Watcher and can simply work with the MonitoringObject class for maintenance methods. The following function demonstrates placing not only the ESX host in MM via Veeam, but also demonstrates recursively enumerating the PMO.GetRelatedMonitoringObjects and placing them into MM via the MonitoringObject class.

public static void GetVMListForMM(string targetName, DateTime dtmNow, DateTime dtmTimeWindow, MaintenanceModeReason reason, string scheduledComment)
                Microsoft.EnterpriseManagement.ManagementGroup managementGroup = new Microsoft.EnterpriseManagement.ManagementGroup(ConfigurationManager.AppSettings["ManagementServer"]);
                ManagementPackClassCriteria criteria = new ManagementPackClassCriteria("Name = 'Veeam.Virt.Extensions.VMware.VMHOST'");
                IList<ManagementPackClass> classes = managementGroup.EntityTypes.GetClasses(criteria);
                List<MonitoringObject> list = new List<MonitoringObject>();
                List<PartialMonitoringObject> monitoringObjects = new List<PartialMonitoringObject>();
                List<string> duplicatelist = new List<string>();
                IObjectReader<MonitoringObject> objectReader = managementGroup.EntityObjects.GetObjectReader<MonitoringObject>(classes[0], ObjectQueryOptions.Default);
                foreach (PartialMonitoringObject monitoringObject in list)
                    if (monitoringObject.DisplayName.ToLower() == targetName.ToLower())
                        foreach (MonitoringObject relatedVMObject in monitoringObject.GetRelatedMonitoringObjects(TraversalDepth.Recursive))
                            if (relatedVMObject.ToString().Contains("VM"))
                                foreach (MonitoringObject monitoredVM in relatedVMObject.GetRelatedMonitoringObjects(TraversalDepth.Recursive))
                                    string strmonitoredVM = monitoredVM.DisplayName.ToLower();
                                    if (strmonitoredVM.Contains("domain.local") == false)
                                        strmonitoredVM = strmonitoredVM + ".domain.local";
                                List<string> uniquelist = duplicatelist.Distinct().ToList();                                
                                foreach (string strmonitoredVM in uniquelist)
                                    IObjectReader<MonitoringObject> readerMonitoredVM = managementGroup.EntityObjects.GetObjectReader<MonitoringObject>(new MonitoringObjectGenericCriteria("Name='" + strmonitoredVM + "'"), ObjectQueryOptions.Default);
                                    foreach(MonitoringObject mo in monitoringObjects.Distinct().ToList())
                                        if (!mo.InMaintenanceMode)
                                            if (!mo.FullName.ToLower().Contains("dhcp")) //Found this was needed to handle objects discovered by the DHCP management pack.
                                                mo.ScheduleMaintenanceMode(dtmNow, dtmTimeWindow, reason, scheduledComment,TraversalDepth.Recursive);
            catch (Exception ex)
                // Handle or log exception
                // Code to wrap it up

Slap a few more methods in there to handle other objects like groups or explicit OS classes directly and pop a little web front end on it and you have a self-service application that allows the different application teams to directly place their servers into maintenance mode without granting access to the SCOM console or requiring them to RDP to a server to run a PowerShell script to schedule a task.

Adding SNMP monitoring for network devices in SCOM 2012 has improved substantially over previous versions. SNMP monitoring is great for basic alerting (up/down) and obviously our performance data will come from SNMP; however, to get fairly granular alerting on most network devices we need to leverage syslog. The documentation on this is fairly light and a little inconsistent so I thought I’d post what works for me.

The first step is to get your devices discovered via SNMP. There are several excellent articles on this as follows so I will not reinvent the wheel here. How to Discover Network Devices in Operations Manager SNMP Trap monitoring with SCOM 2012 R2

Create the Groups It is a good idea to create a new management pack to contain the groups and the rules used in syslog monitoring. It simplifies targeting for overrides without getting into additional details of visibility limitations due to sealed and unsealed management packs. Once you have your devices discovered, the next step is to create groups containing similar devices that will share similar syslog events you want to target. For example, you might want to create a group to contain your routers, another for your edge switches, another for your core switches, another for your VoIP telephony, etc. Populate the groups accordingly with the network devices. When you target the members (either via the Explicit Members or the Dynamic Member tabs) you will want to target objects of type Node (System.NetworkManagement.Node). I would add that you will need a pretty solid IP scheme standard in place (e.g. 1-5 in last octet reserved for routers, 6-15 in last octet reserved for switches, etc.) in your environment to leverage the Dynamic Members tab effectively. If you do, this site has helped me quite a bit with the joy that is regex for ip addresses: Working with regular expressions and ip addresses in OpsMgr 2012. Edit (08.23.2015 - I also bit the bullet and posted some details regarding SCOM and regular expressions in a bit more detail that might help at Using Regular Expressions with SCOM 2012 Groups).

Create the Alerts After you have set up your group(s), you are ready to move on to creating the rules that will alert on the syslog messages. Alerts are categorized from the different system components through defined Facility names listed below. Full parameter list is referenced at IANA here.

Numerical Code Facility
0 kernel messages
1 user-level messages
2 mail system
3 system daemons
4 security/authorization messages
5 messages generated internally by syslogd
6 line printer subsystem
7 network news subsystem
8 UUCP subsystem
9 clock daemon
10 security/authorization messages
11 FTP daemon
12 NTP subsystem
13 log audit
14 log alert
15 clock daemon
16 local use 0 (local0)
17 local use 1 (local1)
18 local use 2 (local2)
19 local use 3 (local3)
20 local use 4 (local4)
21 local use 5 (local5)
22 local use 6 (local6)
23 local use 7 (local7)

Syslog also assigns a criticality to the alert in addition to the location it originates in the system/subsystem. This parameter is known as the Severity of the syslog notification. A table representation of the Severity levels is as follows (referenced from https://en.wikipedia.org/wiki/Syslog#Severity_levels):

Value Severity Keyword Description/Examples
0 Emergency emerg Multiple apps/servers/sites. This level should not be used by applications.
1 Alert alert Should be corrected immediately, An example might be the loss of the primary ISP connection.
2 Critical crit May be used to indicate a failure in the system’s primary application.
3 Error err An application has exceeded it file storage limit and attempts to write are failing.
4 Warning warning May indicate that an error will occur if action is not taken, For example a non-root file system has only 2GB remaining .
5 Notice notice Events that are unusual but not error conditions .
6 Informational info Normal operational messages -no action required. Example an application has started, paused or ended successfully.
7 Debugging debug Info useful to developers for debugging the application.

A good example for a first rule would be alerting on all Severity 0 events.

  1. In the Authoring pane of the Operations Manager console, right click on Rules and create a new Rule.

  2. Select the Syslog (Alert) rule under Event Based. alt text

  3. Give the rule a name and select the group created earlier. This group should contain objects of the Node class. Be sure to uncheck the Rule is enabled checkbox. These rules should all be created as disabled; they will be enabled via overrides later. alt text

  4. To configure the filter two parameters need to be defined: Severity and HostName. The severity is self explanatory. The HostName will be critical to prevent the alert from triggering once for each device in the group. As this rule is targeting Severity 0 (zero) events, set the first parameter to Severity Equals 0. Insert an additional parameter and set it to HostName Equals $Target/Property[Type="System!System.Entity"]/DisplayName$. This will ensure the alert only fires once for each device it matches instead of all group members. This is different from Alert Suppression.alt text
  5. Finally, you want to add a little more description to the alert notification. I find the following is a fairly descriptive summary:
Description: $Data[Default='']/EventDescription$
Facility: $Data/EventData/DataItem/Facility$
Severity: $Data/EventData/DataItem/Severity$
Priority: $Data/EventData/DataItem/Priority$
Priority Name: $Data/EventData/DataItem/PriorityName$
Time Stamp: $Data/EventData/DataItem/TimeStamp$
Host Name: $Data/EventData/DataItem/HostName$

alt text

  1. Go back into the properties of the rule and select the Configuration tab.
  2. Select the Edit button under Responses at the bottom.
  3. Click the Alert Suppression button.
  4. Select the following checkboxes to prevents duplicate alerts: Event Source, Logging Computer, **and **Event Level. Save the changes.
  5. The final step is to Override the alert for the target group(s) you want the alert to be functional and set the override to Enable.alt text

Another option with more flexibility is to use the Message parameter in Step 4. This allows for the use of regular expression matching on the description string. For example, let’s say you want to be alerted if a switch detects a duplicate IP or MAC address on the network. This can be done with a single parameter line as follows:

Message           Matches regular expression                 duplicate.*address

Where “Message” is the Parameter Name; “Matches regular expression” is the Operator; and “duplicate.*address” is the Value. In “duplicate.*address”; the “dot” means to match any character and the “asterisk” means to match any number of times. Therefore, this would match both “duplicate ip address” as well as “duplicate mac address” using SCOM Regular Expression Support.

This method worked for me because I had hundreds of network devices across multiple subnets and this approach allowed me to group the similar devices together and then target all the custom rules I created to the groups where they were applicable based upon the message. This might also be approached from a completely different angle where the rules targeted the server objects receiving the syslog alerts themselves (agents or management servers) instead of Node objects although that might not scale for large environments. Either way, this was mainly a way for me to document what I did for later reference. I hope it may have saved you some time in your configuration.