Nagios Notifications

Summary

The Nagios Field Processor allow to integrate Nagios with Jira. This will also allow:

  • Duplicate emails stack up on the same issue as a comment.

  • The issue to be resolved when a Nagios notification indicates the problem has been resolved

  • Content of the Nagios email body happens to be of the KEY: VALUE format, allow these to be mapped to custom fields.

JEMH Cloud uses the subject of the email to match against pre-existing issues within a specific created time-frame with a 'matching' summary. A matching summary is not just a direct match (which would work) but is also all possible permutations (exchange of value) within the Phrase Set and Notification Types, for example, all the following subjects can relate to the same issue irrespective, so long as the Phrase is listed and the Priority is known.

** RECOVERY Service Alert: localhost/SSH is OK **
** PROBLEM Service Alert: localhost/SSH is CRITICAL **
** PROBLEM Service Alert: localhost/SSH is WHATEVER**

Custom Field Matching

JEMH Cloud aggregates the body parsing Colon Suffix Field Processor ( KEY: VALUE ), but with a variation that means every line that starts with KEY: VALUE will be extracted if there is a custom field existing of the same name.

From the example email above, the following Custom Fields could be defined (and will be for this example).

  • Notification Type

  • Service

  • Host

  • Address

  • State

  • Date/Time

NOTE: The entire content of the message will still be used as the Issue Description or Comment.

How to Enable

  1. Go to Profile > Project Mapping > Field Processors > Edit (cog)

  2. Set Apply field processors to On Create/Comment

  3. Enable Nagios Notifications Field Processor

  4. Looking at the Nagios of the configuration section, an example configuration has been pre-populated, which will be good enough for our example message.

The default configuration will create (or comment) issues when receiving WARNING and CRITICAL messages and close them when receiving OK messages. The latest To Do/In Progress issue created/modified in the last 30d where the summery is one of the (Notification Type, Phrase) combination will be associated to the received email.

  • If you provide a Workflow action and resolution, OK messages will transition (fix) the open issue. 

  • Each PROBLEM (not OK) phrase can be mapped to a Priority. E.g. a CRITICAL message can change issue priority to Blocker.

Example use case

Example Critical notification

Example OK service Restoration notification

Example Critical notification

Example OK service Restoration notification

Go to Test Cases, Create, copy and paste the Problem example above. Remember to update the test's To: address to match the profile's catch email address.

Run the test. Review the report checking the issue has been created and the message fields have been populated.

You should see that the fields have been populated on the issue.

Finally, create a new Recovery test case copying and pasting the example above. Remember to update the test's To: address to match the profile's catch email address.

If you run the Recovery test case, the problem issue will be commented, updated and fixed.

If JEMH Cloud receives an OK email without an associated issue (without the problem issue), a new issue will be created and the issue will be auto transitioned/fixed (if the Nagios workflow has been configured)

Nagios Configuration Options

Issue Search Filter Include Status Categories

This will only look for issues that have this Issue Status.

Unresolved Match Time Limit

This defines the time limit that issues can be checked. Issues must have been updated or created within this time limit. e.g. Updated within the last 30 days.

Notification Types

This defines the notification type that JEMHCloud will allow. e.g. PROBLEM,RECOVERY

Resolve Workflow Action

This is the Workflow action that should run when an ‘OK’ phrase is detected. Typically this is the Resolve Issue action.

Resolve Workflow Resolution

This is the value that should be applied to the Resolution field during Resolve Workflow.

Nagios Related Phrases

The purpose of these values is to enable a relationship between messages about the same problem with slightly different text. For example, a WARNING about an event is related to an OK about that event being resolved, whereas a server DOWN is related to a server UP message, etc. The two groups may intersect, but they aren't related.