Use Nagios Field Processor

Scenario

You want to integrate Nagios with JIRA, but not only that:

  • Duplicates should stack up on the same issue as a comment

  • If a Nagios notification indicates the problem has been resolved the issue should be resolved automatically

  • Content of the Nagios email body happens to be of the KEY: VALUE format, mapping these to custom fields would be a bonus.

Example emails

Before setting up JIRA or JEMH Cloud, the format of the Nagios email needs a review, see CRITICAL and OK messages below.

Example CRITICAL notification

Example OK service restoration notification

Example CRITICAL notification

Example OK service restoration notification

Subject: ** PROBLEM Service Alert: localhost/SSH is CRITICAL ** To: andy@localhost X-mailer: mail (GNU Mailutils 2.2) Date: Sat, 26 Nov 2011 16:53:45 +1300 (NZDT) From: nagios@localhost ***** Nagios ***** Notification Type: PROBLEM Service: SSH Host: localhost Address: 127.0.0.1 State: CRITICAL Date/Time: Sat Nov 26 16:53:45 NZDT 2011 Additional Info: Connection refused
Subject: ** RECOVERY Service Alert: localhost/SSH is OK ** To: andy@localhost X-mailer: mail (GNU Mailutils 2.2) Date: Sat, 26 Nov 2011 17:03:30 +1300 (NZDT) From: nagios@localhost ***** Nagios ***** Notification Type: RECOVERY Service: SSH Host: localhost Address: 127.0.0.1 State: OK Date/Time: Sat Nov 26 17:03:30 NZDT 2011 Additional Info: SSH OK - OpenSSH_5.8p1 Debian-7ubuntu1 (protocol 2.0)

The Nagios notification is structured, the summary specifically has its message type as the first keyword in the message, with a priority at the end. In the CRITICAL example, the message type is PROBLEM, the priority is CRITICAL. The recovery message example has a message type of RECOVERY, with status OK.

All Nagios messages are therefore related, JEMH Cloud refers to the related 'PROBLEM/RECOVERY' messages as Phrase Sets (of each Phrase). Each Phrase has an associated Priority, that is mapped to an appropriate JIRA Priority. When a message is received, the associated priority is set on a created issue, or updated if a related issue is found.

JEMH Cloud uses the subject of the email to match against pre-existing unresolved issues within a specific created time-frame with a 'matching' summary. A matching summary is not just a direct match (which would work) but is also all possible permutations (exchange of value) within the Phrase Set and Notification Types, for example, all the following subjects can relate to the same issue irrespective, so long as the Phrase is listed and the Priority is known.

** RECOVERY Service Alert: localhost/SSH is OK **

Payload Mapping

JEMH Cloud aggregates the body parsing Colon Suffix Field Processor ( KEY: VALUE ), but with a variation that means every line that starts with KEY: VALUE will be extracted if there is a custom field existing of the same name.

Custom field configuration

From the example email above, the following Custom Fields could be defined (and will be for this example). NOTE: The entire content of the message will still be used as the Issue Description or Comment.

  • Notification Type

  • Service

  • Host

  • Address

  • State

  • Date/Time

These example fields are TEXT 255 chars, as the text here is not more than 255chars, larger payloads require unlimited TEXT type.

JEMH Cloud Configuration

First, go to your project mapping, Field Processors, edit (cog), enable Field Processors on Create and Comment and enable Nagios Field Processor:

 

Looking at the Nagios of the configuration section, an example configuration has been pre-populated, which will be good enough for the example messages here.

The default configuration will create (or comment) issues when receiving WARNING and CRITICAL messages and close them when receiving OK messages. The latest To Do/In Progress issue created/modified in the last 30d where the summery is one of the (Notification Type, Phrase) combination will be associated to the received email. E.g: An email with subject 

** PROBLEM Service Alert: localhost/SSH is WARNING **

will be used as comment of the to the open issue with summary

** PROBLEM Service Alert: localhost/SSH is CRITICAL **

and the message with subject

** RECOVERY Service Alert: localhost/SSH is OK **

will be used as comment of the open issue with summary

** PROBLEM Service Alert: localhost/SSH is CRITICAL **

If you provide a Workflow action and resolution, OK messages will transition (fix) the open issue. 

Each PROBLEM (not OK) phrase can be mapped to a Priority. E.g. a CRITICAL message can change issue priority to Blocker.

 

Nagios configuration can be set in its own Project Mapping or Rule if you want to customize email matching rules like sender, recipient and subject regexes, issue project and issue fields like Reporter, Issue Type, Components, Reporter, etc.

Creating a Test Case

Go to Test Cases, Create, copy and paste the Problem example above. Remember to update the test's To: address to match the profile's catch email address.

Run the test

Review the report checking that the issue has been created and the message's fields (Service, Host, etc.) have been populated. Issue should look like this:

If you re-run the test case

The same issue should be updated adding the message's body as a comment

Finally, create a new Recovery test case copying and pasting the example above. Remember to update the test's To: address to match the profile's catch email address.

You should have two Nagios tests in Test Cases

If you run the Recovery test case, the problem issue will be commented, updated and fixed.

The issue has been resolved! It won't be associated to new Nagios emails. A new Nagios problem/recovery will create a new issue.

If JEMH Cloud receives an OK email without an associated issue (without the problem issue), a new issue will be created and the issue will be auto transitioned/fixed (if the Nagios workflow has been configured)

Troubleshooting

Phrases may need updating

It's possible that version or deployment configuration change Nagios Phrases. If new Issues are being created, likely, the JEMH Cloud Nagios Phrases need updating to include the new Phrases that caused additional issue creation.