Monitoring is used to detect issues on a computer and either perform corrective actions or send events to the management servers for processing.
Monitor Types
There are many different types of monitors available in Systems Directorate.
Monitor Type | Description |
---|---|
Basic Monitor | This is the most general monitor. It can look at any data already collected on the agent including both inventory and performance information. |
Log File | Log file monitors watch text files on the monitored computer. |
Windows Log | Watches event logs on the server. |
File Activity | Watches a specific file for size and existence. |
Directory Activity | Watches the contents of a directory for the number of files. |
Tool Monitor | A user created application tool can be executed to provide the data to use for the monitor. Tools can run external programs, execute a plugin, and/or run Python script. |
Ping Monitor | A ping monitor allows one agent to monitor a large group of other servers. |
Certificate Monitor | A certificate monitor watches information about certificates like expiries. |
DNS Monitor | A DNS monitor can perform DNS queries and test the results to make sure DNS configurations are available and correct. |
FTP Monitor | An FTP monitor can login to an FTP server and get a directory listing of a specific directory. The monitor can look for specific files, directory counts, and the connection itself. |
SMTP Monitor | An SMTP monitor can verify that an email server is reachable and that an email message can be sent. |
TCP Monitor | A TCP monitor can make any arbitrary connection to a host over any TCP port to test connectivity. The monitor can watch for errors or timeouts. |
HTTP Monitor | An HTTP monitor can perform simple web queries to any URL and check for connectivity and return codes. The result text can also be analyzed. |
LDAP Monitor | An LDAP monitor can perform LDAP queries against any LDAP based server to test connectivity and results. |
Database Monitor | A database monitor can connect to any SQL Server or PostgreSQL database and perform a query. The monitor can test connectivity and the results of the query for correct information. |
Custom Script | A user provided Python script can be used to watch anything on the computer imaginable. |
External Program | An external program can be run and the result used. |
Plugin | The administrator can create a plugin DLL in .NET that does whatever they want. The results from the plugin can be tested. |
Elements
Each monitor has zero or more elements. Each element performs a separate test for the monitor. Usually those tests are against the data collected by the monitor, but for basic monitors they can be against any data on the agent. Even the advanced monitors can look at the data on a agent as part of their condition logic.
Each element has four sets of severities available. They are Critical, Warning, Normal, and Info. Each of these severities has a list of conditions used to match against the data. As an monitored system becomes worse, it can move from normal, to warning, and finally to critical.
Message Data
When an element is triggered by any matching condition, a message is created. This message information can be configured either at the monitor or overridden at the element level. It can include any custom text or information the operator wishes to include. Lots of other data is added automatically by the system including all global variables, monitor and element properties, and computer information.
Configurable message data includes:
- Host name
- Host type
- Source name. This defaults to the monitor name.
- Item name. This defaults to the element name.
- Instance name. This may be the drive letter, CPU number, etc.
- Correlation ID. This defaults to the hostname + source + item + instance name.
- Message Text
The message is passed to the action processor.
Actions
Actions are a set of responses to execute whenever an element’s conditions are matched. Actions are defined both at the monitor and element level and are merged together as needed.
Actions operate on an occurrence count based on the number of times an element matches a condition. Each time a monitor runs and detects a condition in an element, the occurrence increases by one. For example, if a drive space condition is matched, the action at occurrence 1 may attempt to clean space. The action at occurrence 2 may try other techniques to remove data. And an action at occurrence 3 may finally send an event message about the problem.
The monitor can define a maximum occurrence value that causes it to wrap around to 0 again. This is how repeat actions or de-duplication is handled by an agent. For example, if a monitor checks a disk every 5 minutes, a maximum can be set to 12 so that occurrence 1 happens once per hour. The action for occurrence 1 can then send an event message.
Actions can also be set to run all the time, or when an element stops triggering.
Here are the action types available.
– Send event message
– Send email message
– Send a notification message
– Run application tool
– Execute a custom script
– Execute an external application
– Delete files or directories
– Stop, start, or restart a service
– Stop, start, or restart a process
– Write to a text log file
– Write to a Windows event log
– Restart the computer
– Stop further actions
– Stop checking further elements in the monitor
The action processor takes the provided message and executes the defined action. The default action added whenever a monitor is created is a Send Message action. All other action types can use data from the message as inputs.
Variables
Variables can be defined on monitors and elements and will be included in the message passed to the action processor. Global, company, and computer variables are also included in the message first. All these variables and extra data in the message provide a wealth of information for event processing by the Directorate servers.
Monitor Deployments
Monitors can be deployed to target groups or individual computers. The can also be copied, adjusted, and then re-deployed to other groups or computers. This allows an operator to easily override specific settings as needed for specialized computers.
For example, a default monitor for file systems may be deployed to all Windows computers and warn when space is less than 10%. Some computers however may have less space than that but be okay, so a copy of the monitor can be created with a new threshold of 5% and then deployed to those specific computers.