Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

JEMH has always had Cleanup Expressions (used to remove content from message subjects and body) and Body Delimiter Expressions (used to chop signature text as well as reply-to content).  It worked, but it had two main usability problems:

  1. Fields involved were comma separated regular expressions, which was complicated to manage and had side effects on usability (like matching comma literal characters)

  2. Expressions were global within the Profile, so applied to all issues regardless

  3. Understanding what the impact of a change would be required (a) saving the profile (b) running a test case to (c) view created issue and see the result

What has changed

In JEMH 2.7.3 we address this area by migrating the CSV expressions from the Profile level down into Project Mappings, which is how JEMH is evolving (closer to the sister product JEMHC view).

What

Where is it now

Notes

Profile Subject Cleanups

Re-named Global Subject Cleanups

Values are entered as regular expressions, separated by commas (CSV).

Profile Body Cleanups

Re-named Global Body Cleanups

Values are entered as regular expressions, separated by commas (CSV).

Body Cleanup Order

Setting removed

Global cleanups at the profile level are applied before directives processing. Project Mapping cleanups are applied after directive processing.

Body Delimiters

Migrated into

  • Project Mappings > X > Pre-Processing > Cleanup Expressions

  • Project Mappings > X > Pre-Processing > Body Delimiters

The pre-migrated body delimiter expressions could be prefixed with a \n expression for newline, which limited the number of matches to the number of lines in the email, this reduced regexp matching for large emails from billions to a few K (impact of this was processing time).

As part of the migration, each historic CSV regexp value was evaluated, and if it started with a \n, had it removed.  The reason for this is that ALL Body Delimiters are now prefixed the \n at evaluation time, ensuring that the evaluations are as safe and optimal as possible as well as minimizing the repetitive nature during entry.

Profile Imports

Old Profiles can be imported into 2.7.3, which will also migrate Body Delimiters from the Profile level to the Default Project Mapping.

Project Mappings

Note

Expressions entered within the Default Project Mapping will be inherited and used by the other Project Mappings within that Profile

JEMH configuration is being migrated to project-scoped Project Mappings that permit customization within a Profile.  There is a Default Project Mapping, which contains no rules, and is used when no other Project Mapping rule matched.  The Default Project Mapping can have Cleanup and Delimiter expressions defined that are used by any other Project Mapping as well as having their own.

Type

Subject Cleanup

Body Cleanup

Body Delimiter

Aggregates

Profile

(tick)

(tick)

(error)

n/a

Default Project Mapping

(tick)

(tick)

(tick)

From Profile

Mapped Project Mapping X

(tick)

(tick)

(tick)

From Default

Mapped Project Mapping Y

(tick)

(tick)

(tick)

From Default

Regex 101

Remember that symbols mean something, the dot character ( . ) means match on any character.  The star character ( * ) means match on any number of the previous characters.  Ranges can be used, eg \[0-9]* means any length of numbers, \[A-Z]+ means 1 or more ALPHA chars, \[0-9A-Za-z]+ means 1 or more ALPHANUM chars.  In subjects, there is one line, in body, . matches ALL (full content).

...

Body Expressions are required to match the line of text that prefixes 'replied to' content.  This content varies depending on the email client used and the locale of the client user.  Effectively it is just text so we must match it as closely as possible.  As mentioned above ALL delimiter expressions are evaluated with a \n prefixing them to minimize the number of matches.  This also works at the start of the email content as \n is dynamically injected for that case, and removed after.  Its better to range match on expected text than take the 'lazy' .* route, bad regexps that overuse .* can cause A LOT of possible match evaluations by the Regexp Engine.  If your Jira stops processing mail, one possible causes is an overuse of .*, the Regexp Engine is busy, it could take a few hours, days or longer to complete owing to the combination of Regexp and email size.

HTML To Wiki Markup

HTML Email can be covered to wiki markup via JEMH. However, wiki markup uses additional characters to define styling. Such as text that is formatted in a bold HTML tag,when covered to wiki markup the bold HTML tag is removed and text will instead have * placed around it (wiki markup bold formatting).

As seen in this example:

...

Because wiki markup uses these characters for styling your Regexp will need to account for these characters, both when they are, and when they are not present. The following link will give more details on the extra characters added by wiki markup:
https://jira.atlassian.com/secure/WikiRendererHelpAction.jspa?section=all

Some of theses characters are special in Regexp such as the * and [ ]. So in order for Regexp to specifically match these special characters as text, you need to escape them by using a back slash \ before the special character. See following link for a tutorial and a example:

https://regexone.com/lesson/wildcards_dot

The easier way

With the above in mind, you’re Regexp maybe longer and more complicated to write. The alternative approach you could take is turning off which HTML elements are handled via JEMH. If you go to:

  • JEMH > Profile > Email > HTML 

...

The HTML elements you have switched On will be formatted to markup, which means JEMH will add extra characters to handle these tags. If you no longer want that specific tag to be formatted you can simply switch it to None. This will result in issues no longer having styling for those elements. For example: if the bold and link tags were both switched to None, issues would no longer display bold text and hyperlinks. However, this does mean that the Regexp you have to write will be smaller / easier to write.

Dynamic Evaluation walk-through

...

Setting the Profile Default Project Mapping Cleanups

Note

Expressions entered within the Default Project Mapping will be inherited and used by the other Project Mappings within that Profile

With the Test Case set, access the Pre-Processing tab shown below, and click Add in the Subject section:

...

When we run the Test Case with and without the Body Delimiter we can see the differences more clearly.

With expression

Without expression

Image Modified

Image Modified

Using Body Delimiter with styled text

...

The rendered content will make the text use the formatting that is defined and will remove the Wiki Markup tags from the text. The non rendered content will have the Wiki Markup tags within the text. Below is an example showing the difference between the rendered and non rendered content.

Rendered content

non rendered text content

Image Modified

Image Modified

Using Special characters in a regex expression

...

Commonly used special characters

Special character

Function

.

This will match any single character other than a newline

*

This matches zero or more consecutive characters. For example (/ba*) will match “ba” and “baaa” as the expression will match more than one “a” characters.

\

Escape following special character. This will allow special characters to be matched within the email. For example if you are looking for “[abc]” the regex will have to be \[abc\] as this will mean that the regex will also look for “[ ]” as a character.

( )

Used to define a capture group within a regex.

[ ]

This is used to define which Character sets you want to capture. For example ([a-z]) will capture all characters that are a lower case alphabetical character.

Example of using Special characters

...

After running the Test Case we can see that the content is removed from the middle of the second line as this is where the Regex found a match for *Bold text*.

Without Regex

With Regex

Image Modified

Image Modified


Removing ‘Null’ characters

...