Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JEMH has always had Cleanup Expressions (used to remove content from message subjects and body) and Body Delimiter Expressions (used to chop signature text as well as reply-to content).  It worked, but it had two main usability problems:

  1. Fields involved were comma separated regular expressions, which was complicated to manage and had side effects on usability (like matching comma literal characters)

  2. Expressions were global within the Profile, so applied to all issues regardless

  3. Understanding what the impact of a change would be required (a) saving the profile (b) running a test case to (c) view created issue and see the result

What has changed

In JEMH 2.7.3 we address this area by migrating the CSV expressions from the Profile level down into Project Mappings, which is how JEMH is evolving (closer to the sister product JEMHC view).

What

Where is it now

Notes

Profile Subject Cleanups

Re-named Global Subject Cleanups

Values are entered as regular expressions, separated by commas (CSV).

Profile Body Cleanups

Re-named Global Body Cleanups

Values are entered as regular expressions, separated by commas (CSV).

Body Cleanup Order

Setting removed

Global cleanups at the profile level are applied before directives processing. Project Mapping cleanups are applied after directive processing.

Body Delimiters

Migrated into

  • Project Mappings > X > Pre-Processing > Cleanup Expressions

  • Project Mappings > X > Pre-Processing > Body Delimiters

The pre-migrated body delimiter expressions could be prefixed with a \n expression for newline, which limited the number of matches to the number of lines in the email, this reduced regexp matching for large emails from billions to a few K (impact of this was processing time).

As part of the migration, each historic CSV regexp value was evaluated, and if it started with a \n, had it removed.  The reason for this is that ALL Body Delimiters are now prefixed the \n at evaluation time, ensuring that the evaluations are as safe and optimal as possible as well as minimizing the repetitive nature during entry.

Profile Imports

Old Profiles can be imported into 2.7.3, which will also migrate Body Delimiters from the Profile level to the Default Project Mapping.

Project Mappings

Note

Expressions entered within the Default Project Mapping will be inherited and used by the other Project Mappings within that Profile

JEMH configuration is being migrated to project-scoped Project Mappings that permit customization within a Profile.  There is a Default Project Mapping, which contains no rules, and is used when no other Project Mapping rule matched.  The Default Project Mapping can have Cleanup and Delimiter expressions defined that are used by any other Project Mapping as well as having their own.

Type

Subject Cleanup

Body Cleanup

Body Delimiter

Aggregates

Profile

(tick)

(tick)

(error)

n/a

Default Project Mapping

(tick)

(tick)

(tick)

From Profile

Mapped Project Mapping X

(tick)

(tick)

(tick)

From Default

Mapped Project Mapping Y

(tick)

(tick)

(tick)

From Default

Regex 101

Remember that symbols mean something, the dot character ( . ) means match on any character.  The star character ( * ) means match on any number of the previous characters.  Ranges can be used, eg \[0-9]* means any length of numbers, \[A-Z]+ means 1 or more ALPHA chars, \[0-9A-Za-z]+ means 1 or more ALPHANUM chars.  In subjects, there is one line, in body, . matches ALL (full content).

...

Body Expressions are required to match the line of text that prefixes 'replied to' content.  This content varies depending on the email client used and the locale of the client user.  Effectively it is just text so we must match it as closely as possible.  As mentioned above ALL delimiter expressions are evaluated with a \n prefixing them to minimize the number of matches.  This also works at the start of the email content as \n is dynamically injected for that case, and removed after.  Its better to range match on expected text than take the 'lazy' .* route, bad regexps that overuse .* can cause A LOT of possible match evaluations by the Regexp Engine.  If your Jira stops processing mail, one possible causes is an overuse of .*, the Regexp Engine is busy, it could take a few hours, days or longer to complete owing to the combination of Regexp and email size.

HTML To Wiki Markup

HTML Email can be covered to wiki markup via JEMH. However, wiki markup uses additional characters to define styling. Such as text that is formatted in a bold HTML tag,when covered to wiki markup the bold HTML tag is removed and text will instead have * placed around it (wiki markup bold formatting).

As seen in this example:

...

Because wiki markup uses these characters for styling your Regexp will need to account for these characters, both when they are, and when they are not present. The following link will give more details on the extra characters added by wiki markup:
https://jira.atlassian.com/secure/WikiRendererHelpAction.jspa?section=all

Some of theses characters are special in Regexp such as the * and [ ]. So in order for Regexp to specifically match these special characters as text, you need to escape them by using a back slash \ before the special character. See following link for a tutorial and a example:

https://regexone.com/lesson/wildcards_dot

The easier way

With the above in mind, you’re Regexp maybe longer and more complicated to write. The alternative approach you could take is turning off which HTML elements are handled via JEMH. If you go to:

  • JEMH > Profile > Email > HTML 

...

The HTML elements you have switched On will be formatted to markup, which means JEMH will add extra characters to handle these tags. If you no longer want that specific tag to be formatted you can simply switch it to None. This will result in issues no longer having styling for those elements. For example: if the bold and link tags were both switched to None, issues would no longer display bold text and hyperlinks. However, this does mean that the Regexp you have to write will be smaller / easier to write.

Dynamic Evaluation walk-through

...

Setting the Profile Default Project Mapping Cleanups

Note

Expressions entered within the Default Project Mapping will be inherited and used by the other Project Mappings within that Profile

With the Test Case set, access the Pre-Processing tab shown below, and click Add in the Subject section:

...

Below is an example of a simple regex that will match the word“Testing”word “Testing”. To test the expressions you can use the Dynamic Evaluation feature, this will run the expressions against a Test Case and will show where the expressions matched.

...

From testing the expression we can see that most of the content has been removed as the regex Regex found a match for “Testing” (which is on the second line) and removed the content after that point.

...

When we run the Test Case with and without the Body Delimiter we can see the differences more clearly.

With expression

Without expression

Image Modified

Image Modified

Using Body Delimiter with styled text

If your Profile is converting HTML to Wiki Markup for Jira to render the emails then the expression will become more complicated. Wiki markup Markup adds extra characters to the email body so that when Jira renders the email content it will display the text using the style that is defined by the Wiki Markup. This makes the regex Regex more complicated to make as the expression will need to also match these characters if the word/phrase uses Wiki markup Markup to change the appearance. To check if you are using Wiki markup Markup go to Profile > Email > HTML and inhere there are some settings that enable you to select which styles you would like to render. For example Bold and Italic.

...

The rendered content will make the text use the formatting that is defined and will remove the wiki markup Wiki Markup tags from the text. The non rendered content will have the Wiki markup Markup tags within the text. Below is an example showing the difference between the rendered and non rendered content.

Rendered content

non rendered text content

Image Modified

Image Modified

Using Special characters in a regex expression

Regex contains special characters that have a special meaning with the regexRegex. These Special characters tells the expression to look for values that match the condition. For example “*.” means to look for a word that has characters before the word, indicating that the word is mid sentence.

Commonly used special characters

Special character

Function

.

This will match any single character other than a newline

*

This matches zero or more consecutive characters. For example (/ba*) will match “ba” and “baaa” as the expression will match more than one “a” characters.

\

Escape following special character. This will allow special characters to be matched within the email. For example if you are looking for “[abc]” the regex will have to be \[abc\] as this will mean that the regex will also look for “[ ]” as a character.

( )

Used to define a capture group within a regex.

[ ]

This is used to define which Character sets you want to capture. For example ([a-z]) will capture all characters that are a lower case alphabetical character.

Example of using Special characters

When using Wiki markup Markup it will insert extra characters into the body content as these characters are used to render the email content. Sometimes these characters used in wiki markup Wiki Markup are the same as special characters for Regex. To overcome this you would use “\” followed by the special character, this will stop that character from being used as a special character.

Example Regex

This regex Regex will check if there is any characters before Bold text* and escapes “*“ so that it can found within the content and then checks if *Bold text* is used within the content. If this matches it will then remove the content that follows.

...

After running the Test Case we can see that the content is removed from the middle of the second line as this is where the Regex found a match for *Bold text*.

Without Regex

With Regex

Image Modified

Image Modified


Removing ‘Null’ characters

...