Nagios Notifications
Summary
The Nagios Field Processor allow to integrate Nagios with Jira. This will also allow:
Duplicate emails stack up on the same issue as a comment.
The issue to be resolved when a Nagios notification indicates the problem has been resolved
Content of the Nagios email body happens to be of the KEY: VALUE format, allow these to be mapped to custom fields.
Related Issues/matching
JEMH Cloud uses the subject of the email to match against pre-existing issues within a specific created time-frame with a 'matching' summary. A matching summary is not just a direct match (which would work) but is also all possible permutations (exchange of value) within the Phrase Set and Notification Types, for example, all the following subjects can relate to the same issue irrespective, so long as the Phrase is listed and the Priority is known.
** RECOVERY Service Alert: localhost/SSH is OK **
** PROBLEM Service Alert: localhost/SSH is CRITICAL **
** PROBLEM Service Alert: localhost/SSH is WHATEVER**
Custom Field Matching
JEMH Cloud aggregates the body parsing Colon Suffix Field Processor ( KEY: VALUE ), but with a variation that means every line that starts with KEY: VALUE will be extracted if there is a custom field existing of the same name.
From the example email above, the following Custom Fields could be defined (and will be for this example).
Notification Type
Service
Host
Address
State
Date/Time
NOTE: The entire content of the message will still be used as the Issue Description or Comment.
How to Enable
Go to Profile > Project Mapping > Field Processors > Edit (cog)
Set Apply field processors to On Create/Comment
Enable Nagios Notifications Field Processor
Looking at the Nagios of the configuration section, an example configuration has been pre-populated, which will be good enough for our example message.
The default configuration will create (or comment) issues when receiving WARNING and CRITICAL messages and close them when receiving OK messages. The latest To Do/In Progress issue created/modified in the last 30d where the summery is one of the (Notification Type, Phrase) combination will be associated to the received email.
If you provide a Workflow action and resolution, OK messages will transition (fix) the open issue.
Each PROBLEM (not OK) phrase can be mapped to a Priority. E.g. a CRITICAL message can change issue priority to Blocker.
Example use case
Example Critical notification | Example OK service Restoration notification |
---|---|
Go to Test Cases, Create, copy and paste the Problem example above. Remember to update the test's To: address to match the profile's catch email address.
Run the test. Review the report checking the issue has been created and the message fields have been populated.
You should see that the fields have been populated on the issue.
Finally, create a new Recovery test case copying and pasting the example above. Remember to update the test's To: address to match the profile's catch email address.
If you run the Recovery test case, the problem issue will be commented, updated and fixed.
If JEMH Cloud receives an OK email without an associated issue (without the problem issue), a new issue will be created and the issue will be auto transitioned/fixed (if the Nagios workflow has been configured)
Nagios Configuration Options
Issue Search Filter Include Status Categories
This will only look for issues that have this Issue Status.
Unresolved Match Time Limit
This defines the time limit that issues can be checked. Issues must have been updated or created within this time limit. e.g. Updated within the last 30 days.
Notification Types
This defines the notification type that JEMHCloud will allow. e.g. PROBLEM,RECOVERY
Resolve Workflow Action
This is the Workflow action that should run when an ‘OK’ phrase is detected. Typically this is the Resolve Issue action.
Resolve Workflow Resolution
This is the value that should be applied to the Resolution field during Resolve Workflow.
Nagios Related Phrases
The purpose of these values is to enable a relationship between messages about the same problem with slightly different text. For example, a WARNING about an event is related to an OK about that event being resolved, whereas a server DOWN is related to a server UP message, etc. The two groups may intersect, but they aren't related.