Use Project Mapping Cleanup and Body Delimiters

Since

Available since JEMH 2.7.3

- 1.1 Since
2 Introduction
- 2.1 History
- 2.2 What has changed
- 2.3 Profile Imports
- 2.4 Project Mappings
- 2.5 Regex 101
- 2.6 Cleanup Expressions
- 2.7 Body Delimiters
- 2.8 HTML To Wiki Markup
  - 2.8.1 The easier way
3 Dynamic Evaluation walk-through

Introduction

History

JEMH has always had Cleanup Expressions (used to remove content from message subjects and body) and Body Delimiter Expressions (used to chop signature text as well as reply-to content). It worked, but it had two main usability problems:

Fields involved were comma separated regular expressions, which was complicated to manage and had side effects on usability (like matching comma literal characters)
Expressions were global within the Profile, so applied to all issues regardless
Understanding what the impact of a change would be required (a) saving the profile (b) running a test case to (c) view created issue and see the result

What has changed

In JEMH 2.7.3 we address this area by migrating the CSV expressions from the Profile level down into Project Mappings, which is how JEMH is evolving (closer to the sister product JEMHC view).

What	Where is it now	Notes

What	Where is it now	Notes
Profile Subject Cleanups	Re-named Global Subject Cleanups	Values are entered as regular expressions, separated by commas (CSV).
Profile Body Cleanups	Re-named Global Body Cleanups	Values are entered as regular expressions, separated by commas (CSV).
Body Cleanup Order	Setting removed	Global cleanups at the profile level are applied before directives processing. Project Mapping cleanups are applied after directive processing.
Body Delimiters	Migrated into Project Mappings > X > Pre-Processing > Cleanup Expressions Project Mappings > X > Pre-Processing > Body Delimiters	The pre-migrated body delimiter expressions could be prefixed with a \n expression for newline, which limited the number of matches to the number of lines in the email, this reduced regexp matching for large emails from billions to a few K (impact of this was processing time). As part of the migration, each historic CSV regexp value was evaluated, and if it started with a \n, had it removed. The reason for this is that ALL Body Delimiters are now prefixed the \n at evaluation time, ensuring that the evaluations are as safe and optimal as possible as well as minimizing the repetitive nature during entry.

Profile Imports

Old Profiles can be imported into 2.7.3, which will also migrate Body Delimiters from the Profile level to the Default Project Mapping.

Project Mappings

Expressions entered within the Default Project Mapping will be inherited and used by the other Project Mappings within that Profile

JEMH configuration is being migrated to project-scoped Project Mappings that permit customization within a Profile. There is a Default Project Mapping, which contains no rules, and is used when no other Project Mapping rule matched. The Default Project Mapping can have Cleanup and Delimiter expressions defined that are used by any other Project Mapping as well as having their own.

Type	Subject Cleanup	Body Cleanup	Body Delimiter	Aggregates

Type	Subject Cleanup	Body Cleanup	Body Delimiter	Aggregates
Profile				n/a
Default Project Mapping				From Profile
Mapped Project Mapping X				From Default
Mapped Project Mapping Y				From Default

Regex 101

Remember that symbols mean something, the dot character ( . ) means match on any character. The star character ( * ) means match on any number of the previous characters. Ranges can be used, eg \[0-9]* means any length of numbers, \[A-Z]+ means 1 or more ALPHA chars, \[0-9A-Za-z]+ means 1 or more ALPHANUM chars. In subjects, there is one line, in body, . matches ALL (full content).

Cleanup Expressions

Maybe its a word, or an expression, that you want removed prior to issue creation, whether in the subject or the body.

Body Delimiters

Body Expressions are required to match the line of text that prefixes 'replied to' content. This content varies depending on the email client used and the locale of the client user. Effectively it is just text so we must match it as closely as possible. As mentioned above ALL delimiter expressions are evaluated with a \n prefixing them to minimize the number of matches. This also works at the start of the email content as \n is dynamically injected for that case, and removed after. Its better to range match on expected text than take the 'lazy' .* route, bad regexps that overuse .* can cause A LOT of possible match evaluations by the Regexp Engine. If your Jira stops processing mail, one possible causes is an overuse of .*, the Regexp Engine is busy, it could take a few hours, days or longer to complete owing to the combination of Regexp and email size.

HTML To Wiki Markup

HTML Email can be covered to wiki markup via JEMH. However, wiki markup uses additional characters to define styling. Such as text that is formatted in a bold HTML tag, when covered to wiki markup the bold HTML tag is removed and text will instead have * placed around it (wiki markup bold formatting).

As seen in this example:

For the is email example the From, Sent, To, CC and Subject line are all formatted as Bold text.

Because wiki markup uses these characters for styling your Regexp will need to account for these characters, both when they are, and when they are not present. The following link will give more details on the extra characters added by wiki markup:
https://jira.atlassian.com/secure/WikiRendererHelpAction.jspa?section=all

Some of theses characters are special in Regexp such as the * and [ ]. So in order for Regexp to specifically match these special characters as text, you need to escape them by using a back slash \ before the special character. See following link for a tutorial and a example:

https://regexone.com/lesson/wildcards_dot

The easier way

With the above in mind, you’re Regexp maybe longer and more complicated to write. The alternative approach you could take is turning off which HTML elements are handled via JEMH. If you go to:

JEMH > Profile > Email > HTML

The HTML elements you have switched On will be formatted to markup, which means JEMH will add extra characters to handle these tags. If you no longer want that specific tag to be formatted you can simply switch it to None. This will result in issues no longer having styling for those elements. For example: if the bold and link tags were both switched to None, issues would no longer display bold text and hyperlinks. However, this does mean that the Regexp you have to write will be smaller / easier to write.

Dynamic Evaluation walk-through

Based on the following Test Case (you can paste it as a new JEMH Test Case), we will show how cleanups and delimiters can be used.

You may need to change the email to: address to match you Profile > Email > catchEmail expected address, and possible update the From: address to match a Jira user!

MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, ABC update DEF as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

some text GHI

hello'world

On 5 July 2018 at 06:48, Some bodys name <some.body@some.com> wrote:

blah blah blah

Create the test case above, and select the Profile to use. This association means that the Test Case can be picked for self-text in the next section:

Evaluate the test case with no extra setup

from the Default Project Mapping within the Profile picked in the last step, pick the just saved Test case from the Project Mapping > Pre-Processing > Dynamic Evaluation section, set for Comment processing, you'll see as soon as a Test Case is picked, its automatically evaluated with Output text shown. Its possible from that screen to also quickly see for both Subject and Body, the Input text as well as a summary of what Regexp Rules were executed:

Setting the Profile Default Project Mapping Cleanups

Expressions entered within the Default Project Mapping will be inherited and used by the other Project Mappings within that Profile

With the Test Case set, access the Pre-Processing tab shown below, and click Add in the Subject section:

A new entry will be dynamically injected, but will not actually be saved - in fact no changes at all will be made until the page is submitted. Set the value to ABC, matching a part of the subject of the Test Case:

Now, as soon as you move out of that text field (eg use TAB) the changes are dynamically evaluated and the result shown, see how ABC is now missing.

We repeat this with a Body Cleanup

And after tabbing out we can see the result:

Next up we add a Body Delimiter Expression, the following will be used as an example, we don't have to match the entire line, just the start (of course, the less you match, the more its 'possible' to match actual body content that should not be clipped):

NOTE: The default for Body Delimiters is to apply on Create as well as Comment:

On [0-9]+ (January|February|March|April|May|June|July|August|September|October|November|December) 20[0-9]{2} at [0-9]{2}:[0-9]{2}

After tabbing out we can see the content was clipped.

For final proof, save the form to persist changes:

Navigate to the TestCase view, and Run the Test Case against the Profile:

Accessing the related NEW issue, you'll see that the issue summary has the ABC value "Cleaned Up", the Body value hello'world value "Cleaned Up" and the body 'reply to' content clipped:

How to Build a Body Delimiter expression

Body delimiters use Regular expressions to match and remove Body content from the email. This allows you to remove the content that you deem irrelevant and leave the relevant content in the email body which will be shown in the Issue description/comment. The expressions can be as simple as matching one word from the start of a line or they can be more complicated for example matching multiple words and numbers. Body Delimiters are set within Profile > Project Mapping > Pre-Processing.

Below is an example of a simple regex that will match the word “Testing”. To test the expressions you can use the Dynamic Evaluation feature, this will run the expressions against a Test Case and will show where the expressions matched.

Example Expression

Test Case Used

MIME-Version: 1.0
Date: Mon, 22 Aug 2022 11:40:45 +0100
Message-ID: <CALnh4wQbAvR2uUJYsh-BcNR0ARr1eXtk6oDhKYNjteXXB44F6g@mail.gmail.com>
Subject: testing body delimiters
From: reporter@localhost
To: mailbox@localhost
Content-Type: multipart/alternative; boundary="0000000000008463d505e6d214ce"

--0000000000008463d505e6d214ce
Content-Type: text/plain; charset="UTF-8"

Hello world

Testing *Bold text*

Content that will be removed.

--0000000000008463d505e6d214ce
Content-Type: text/html; charset="UTF-8"

<div dir="ltr">Hello world<div><br></div><div>Testing<b>Bold text</b></div><div><br></div><div>Content that will be removed.</div></div>

--0000000000008463d505e6d214ce--

Testing the expression

From testing the expression we can see that most of the content has been removed as the Regex found a match for “Testing” (which is on the second line) and removed the content after that point.

Screenshot of Issue created with and without the body delimiter

When we run the Test Case with and without the Body Delimiter we can see the differences more clearly.

With expression	Without expression

With expression	Without expression

Using Body Delimiter with styled text

If your Profile is converting HTML to Wiki Markup for Jira to render the emails then the expression will become more complicated. Wiki Markup adds extra characters to the email body so that when Jira renders the email content it will display the text using the style that is defined by the Wiki Markup. This makes the Regex more complicated to make as the expression will need to also match these characters if the word/phrase uses Wiki Markup to change the appearance. To check if you are using Wiki Markup go to Profile > Email > HTML and in here there are some settings that enable you to select which styles you would like to render. For example Bold and Italic.

Difference between rendered and non rendered text content

The rendered content will make the text use the formatting that is defined and will remove the Wiki Markup tags from the text. The non rendered content will have the Wiki Markup tags within the text. Below is an example showing the difference between the rendered and non rendered content.

Rendered content	non rendered text content

Rendered content	non rendered text content

Using Special characters in a regex expression

Regex contains special characters that have a special meaning with the Regex. These Special characters tells the expression to look for values that match the condition. For example “*.” means to look for a word that has characters before the word, indicating that the word is mid sentence.

Commonly used special characters

Special character	Function

Special character	Function
.	This will match any single character other than a newline
*	This matches zero or more consecutive characters. For example (/ba*) will match “ba” and “baaa” as the expression will match more than one “a” characters.
\	Escape following special character. This will allow special characters to be matched within the email. For example if you are looking for “[abc]” the regex will have to be \[abc\] as this will mean that the regex will also look for “[ ]” as a character.
( )	Used to define a capture group within a regex.
[ ]	This is used to define which Character sets you want to capture. For example ([a-z]) will capture all characters that are a lower case alphabetical character.

Example of using Special characters

When using Wiki Markup it will insert extra characters into the body content as these characters are used to render the email content. Sometimes these characters used in Wiki Markup are the same as special characters for Regex. To overcome this you would use “\” followed by the special character, this will stop that character from being used as a special character.

Example Regex

This Regex will check if there is any characters before Bold text* and escapes “*“ so that it can found within the content and then checks if *Bold text* is used within the content. If this matches it will then remove the content that follows.

Running a Test Case against the body delimiter

After running the Test Case we can see that the content is removed from the middle of the second line as this is where the Regex found a match for *Bold text*.

Without Regex	With Regex

Without Regex	With Regex

Removing ‘Null’ characters

Null characters can cause email processing to fail if the database used does not support Null characters. Jira will attempt to save the character and return an error, such as:

We can't create this issue for you right now, it could be due to unsupported content you've entered into one or more of the issue fields.

ERROR: invalid byte sequence for encoding "UTF8": 0x00

The following Regexp will identify a ‘null’ character for removal in a Subject or Body Cleanup, allowing emails to be processed.

[\x{0000}]+

Enterprise Mail Handler for Jira Data Center (JEMH)