Use Nagios Field Processor
Scenario
You want to integrate Nagios with JIRA, but not only that:
Duplicates should stack up on the same issue as a comment
If a Nagios notification indicates the problem has been resolved the issue should be resolved automatically
Content of the Nagios email body happens to be of the KEY: VALUE format, mapping these to custom fields would be a bonus.
Example emails
Before setting up JIRA or JEMH Cloud, the format of the Nagios email needs a review, see CRITICAL and OK messages below.
Example CRITICAL notification | Example OK service restoration notification |
---|---|
Subject: ** PROBLEM Service Alert: localhost/SSH is CRITICAL **
To: andy@localhost
X-mailer: mail (GNU Mailutils 2.2)
Date: Sat, 26 Nov 2011 16:53:45 +1300 (NZDT)
From: nagios@localhost
***** Nagios *****
Notification Type: PROBLEM
Service: SSH
Host: localhost
Address: 127.0.0.1
State: CRITICAL
Date/Time: Sat Nov 26 16:53:45 NZDT 2011
Additional Info:
Connection refused | Subject: ** RECOVERY Service Alert: localhost/SSH is OK **
To: andy@localhost
X-mailer: mail (GNU Mailutils 2.2)
Date: Sat, 26 Nov 2011 17:03:30 +1300 (NZDT)
From: nagios@localhost
***** Nagios *****
Notification Type: RECOVERY
Service: SSH
Host: localhost
Address: 127.0.0.1
State: OK
Date/Time: Sat Nov 26 17:03:30 NZDT 2011
Additional Info:
SSH OK - OpenSSH_5.8p1 Debian-7ubuntu1 (protocol 2.0) |
The Nagios notification is structured, the summary specifically has its message type as the first keyword in the message, with a priority at the end. In the CRITICAL example, the message type is PROBLEM, the priority is CRITICAL. The recovery message example has a message type of RECOVERY, with status OK.
All Nagios messages are therefore related, JEMH Cloud refers to the related 'PROBLEM/RECOVERY' messages as Phrase Sets (of each Phrase). Each Phrase has an associated Priority, that is mapped to an appropriate JIRA Priority. When a message is received, the associated priority is set on a created issue, or updated if a related issue is found.
Related issues/matching
JEMH Cloud uses the subject of the email to match against pre-existing unresolved issues within a specific created time-frame with a 'matching' summary. A matching summary is not just a direct match (which would work) but is also all possible permutations (exchange of value) within the Phrase Set and Notification Types, for example, all the following subjects can relate to the same issue irrespective, so long as the Phrase is listed and the Priority is known.
** RECOVERY Service Alert: localhost/SSH is OK **
Payload Mapping
JEMH Cloud aggregates the body parsing Colon Suffix Field Processor ( KEY: VALUE ), but with a variation that means every line that starts with KEY: VALUE will be extracted if there is a custom field existing of the same name.
Custom field configuration
From the example email above, the following Custom Fields could be defined (and will be for this example). NOTE: The entire content of the message will still be used as the Issue Description or Comment.
Notification Type
Service
Host
Address
State
Date/Time
These example fields are TEXT 255 chars, as the text here is not more than 255chars, larger payloads require unlimited TEXT type.
JEMH Cloud Configuration
First, go to your project mapping, Field Processors, edit (cog), enable Field Processors on Create and Comment and enable Nagios Field Processor:
Looking at the Nagios of the configuration section, an example configuration has been pre-populated, which will be good enough for the example messages here.
The default configuration will create (or comment) issues when receiving WARNING and CRITICAL messages and close them when receiving OK messages. The latest To Do/In Progress issue created/modified in the last 30d where the summery is one of the (Notification Type, Phrase) combination will be associated to the received email. E.g: An email with subject
** PROBLEM Service Alert: localhost/SSH is WARNING **
will be used as comment of the to the open issue with summary
** PROBLEM Service Alert: localhost/SSH is CRITICAL **
and the message with subject
** RECOVERY Service Alert: localhost/SSH is OK **
will be used as comment of the open issue with summary
** PROBLEM Service Alert: localhost/SSH is CRITICAL **
If you provide a Workflow action and resolution, OK messages will transition (fix) the open issue.
Each PROBLEM (not OK) phrase can be mapped to a Priority. E.g. a CRITICAL message can change issue priority to Blocker.
Nagios configuration can be set in its own Project Mapping or Rule if you want to customize email matching rules like sender, recipient and subject regexes, issue project and issue fields like Reporter, Issue Type, Components, Reporter, etc.
Creating a Test Case
Go to Test Cases, Create, copy and paste the Problem example above. Remember to update the test's To: address to match the profile's catch email address.
Run the test
Review the report checking that the issue has been created and the message's fields (Service, Host, etc.) have been populated. Issue should look like this:
If you re-run the test case
The same issue should be updated adding the message's body as a comment
Finally, create a new Recovery test case copying and pasting the example above. Remember to update the test's To: address to match the profile's catch email address.
You should have two Nagios tests in Test Cases
If you run the Recovery test case, the problem issue will be commented, updated and fixed.
The issue has been resolved! It won't be associated to new Nagios emails. A new Nagios problem/recovery will create a new issue.
If JEMH Cloud receives an OK email without an associated issue (without the problem issue), a new issue will be created and the issue will be auto transitioned/fixed (if the Nagios workflow has been configured)
Troubleshooting
Phrases may need updating
It's possible that version or deployment configuration change Nagios Phrases. If new Issues are being created, likely, the JEMH Cloud Nagios Phrases need updating to include the new Phrases that caused additional issue creation.
Related articles