Remove replied-to content



  • Use of .* can lead to a massive computational load that the regexp engine has to evaluate.  Compound use of .* can result in insanely massive volume that translate to what appears to be a hung mail process whilst the several billion possibilities are tested.

  • Use static data range parameters e.g. [0-9]+ which limits the number of possibilities that can apply.

  • Use \n when looking for a word at the beginning of a line in order to limit the number of possible matches.

Background

When users reply to emails generated by JIRA the replied-to content is prefixed with an attribution line who's format is completely non-standard and is subject to language, locale and OS variations.  This means there is no single solution for every case an a per scenario solution has to be crafted. JEMH enables the configuration of expressions that match individual attribution styles: a worked example is below.

Worked Example

Preparing content

To obtain the content you will need to construct your expression copy a comment that has the replied-to content and paste into JEMH's Regexp Tester. Here you can write and test your expressions against the comment content to ensure you cut the information you want to.

Example data

When commenting on an issue via email the reply-to content of the email can also be included in the actual comment that is posted on an issue.  This can be removed through: JEMH > Profile > Project Mapping > Pre-Processing : Body Delimiter Regexps.

Below is an example of a comment that is posting the replied-to content in the comment.

== Created by JEMH via e-mail from: "Test User" <test@example.com> == Hi Admin The issue still persists. - Test User On Monday, August 20, 2016 1:35 PM, "admin@example.com" <admin@example.com> wrote: Hi Test Try disabling that feature and see what happens when you re run it. - Admin

 

Match leading text

To remove the replied-to comment we need to come up with a regexp that will remove all content from the "On Monday" line and below and can be tested through the use of JEMH's Regexp Tester.

The first step is to detect which line the replied-to content begins on, in this case the line begins with "On" so this can be detected with:

On

 

However the word "On"  can be detected anywhere within the comment and so needs to be more specific. We use \n to detect that the we are searching for word is at the beginning of a new line and use \s* to check for any white space between the beginning of the line and the first word.  The * means Zero or more times of the preceding character (white space here), the example above has 4.

\n\s*On

 

Matching day of the week

The next step is to match the day listed in the replied-to content, in the example above that day is "Monday" however the expression has to be able to account for all the days of the week.

To do this we use an OR'd condition that includes all possibilities, separated by the | symbol.

The expression is able to match all the days that the replied-to content can have.

 

Matching a comma

The expression to match a comma, this can cause issues because JEMH uses commas to separate multiple expressions. To get around this we use the hex code for a comma instead.

The expression is able to match commas.

 

Matching month and date

Next we need to match the month and date.  To match the month we use a data range which is defined using square brackets  [ ] for example [0-9]+ .  To make sure we cover all the possible dates that can appear we populate the square brackets with 0-9 which means any number between 0 and 9 can appear and so the brackets have a + added to it so there can be one or more numbers.

The expression match the month and date.

 

Matching year, time and AM/PM

The next stage is to match the year and date.  The year is matched using the expression 20[0-9]{2}, the static 20 matches the century with with the next two digits being represented by [0-9]{2}. The time is matched in two parts, using [0-9]+ that represents the hour (the + means one or more of the preceding characters), the minutes are matched using [0-9]{2} with AM/PM being matched by the range (AM|PM).

The expression can match the year and time in both AM and PM.

 

Matching the from address

The email content can indicate some details of the sender of the mail being replied to, here we focus a static example.

The from address part of the line is:

 To match this we just attach it to the end of the current expression.

 

With the expression above JEMH will now be able to match the beginning line of the replied-to content from the user admin even if the day, month, date, year, time is changed.