Remove content from the processed email body

Body content from incoming emails can be selectively removed in order to alter the resulting issue description or comment. Regular expressions are used to match parts of the processed email body (note that when creating regular expressions it is best to work with post-processed content from a created/commented issue).

Removing specific parts of email content

To remove only particular parts of an email's content, you will most likely want to use a Cleanup Regexp. A clean up will only remove content that matches the regular expression.



Example 1
MIME-Version: 1.0 Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT) Date: Sun, 19 Jun 2011 17:42:26 +1200 Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com> Subject: This is a starting email template, update as required From: "Andy Brook" <andy@localhost> To: changeme@thiswontwork.com Content-Type: text/plain; charset=UTF-8 Hello, This text should stay. !!This text should be removed!! This should also stay. Regards, Sender



Cleanup Regexp

Resulting body



!![\w\s]*!!



Hello, This text should stay. This should also stay. Regards, Sender





Example 2



Cleanup Regexp

Resulting body









Removing all content from a specific point onwards

To remove all content in an email starting from one point until the end of the content, you will want to use a Body Delimiter. A delimiter removes all content from its starting point. The starting point is where the delimiter regular expression first matches.

Email signatures and all trailing text can be removed through a Create/Comment Body Delimiter:

With a Create/Comment Body Delimiter that matches the line above 'This is my sig', you can remove the signature from the final body:

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment





Removing old email content in replies

Things get interesting when replies to emails are handled, as there is no standard for the format of replies:

The Comment Only Body Delimiter Regexps are used to match these lines (only when commenting on an issue) to identify the point at which the body should be truncated.  Truncation on reply then results in just the required text being added:

Comment Only Body Delimiter

Resulting comment body

Comment Only Body Delimiter

Resulting comment body





Handling forwarded emails

When Dealing with forwarded email things get more complicated.  During create, its less likely that you would would to apply signature stripping because the first match would cause content to be truncated there and then.  During comment, its worse, because the content being forwarded looks like a reply, so would be stripped as part of the reply.

The setting Forwarded Email Subject Prefixes enables configuration of nominated email subject prefixes in a comma separated list.

Example Forwarded Email Subject Prefixes configuration

Example Forwarded Email Subject Prefixes configuration

Fwd:,Fw:,WG:,Doorst:

A forwarded message such as the one below will then be processed as follows:





Create

Comment

Create

Comment

  • Subject will have any listed 'forward' prefixes removed from the subject (to be the summary)

  • Body will not be stripped

Issue Description:



  • Subject is ignored, but that nature of the message being forwarded will mean:

  • Body will not be stripped

Issue Comment:



Removing characters using the related Unicode values

Sometimes emails are processed that contain specific characters that you would not like to be included within the Issue Description/Comment. With the Use of the Global Body Cleanup Regexps or the Project Mapping Body Cleanup you are able to configure a regexp that will either Match a specific Character or a Range of Characters.

Remove a Specific Character

First you will need to identify the Escape sequence for the specific Character within the following page https://www.rapidtables.com/code/text/unicode-characters.html. For example the Escape Sequence for “¿” is “\u00BF”. Once you have the Escape Sequence for the character you will then be able to create the Regexp by adding “[“ before and “]” after the Escape Sequence.

The below example shows the Regexp that is used to remove “¿” from the below email.

Example email:

Example Regex and email outcome:

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

Remove a Range of Characters

If there is a range of characters that you want to remove then this can be done by finding the Escape Sequence of the start character and last character that should be matched. This can be identified by using the following page: https://www.rapidtables.com/code/text/unicode-characters.html. Once you have the relevant Escape Sequence for the two characters you will need to define the Regex as follows: [escape1-escape2] for example [\u00A0-\u00FF].

Below is an example Regexp Range that will remove all characters that have a Escape Sequence between “\u00A0” and “\u00FF”

Example email:

Example Regex and email outcome:

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

Related articles