Remove content from the processed email body

1 Removing specific parts of email content
- 1.1 Example 1
- 1.2 Example 2
2 Removing all content from a specific point onwards
3 Removing old email content in replies
4 Handling forwarded emails
5 Removing characters using the related Unicode values
- 5.1 Remove a Specific Character
- 5.2 Remove a Range of Characters
6 Related articles

Body content from incoming emails can be selectively removed in order to alter the resulting issue description or comment. Regular expressions are used to match parts of the processed email body (note that when creating regular expressions it is best to work with post-processed content from a created/commented issue).

Removing specific parts of email content

To remove only particular parts of an email's content, you will most likely want to use a Cleanup Regexp. A clean up will only remove content that matches the regular expression.

Example 1

MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

Hello,

This text should stay. !!This text should be removed!! This should also stay.

Regards,
Sender

Cleanup Regexp

Resulting body

!![\w\s]*!!

Hello,

This text should stay. This should also stay.

Regards,
Sender

Example 2

MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

Hello,

This text should stay. !!Keep the exclamation marks!! This should also stay.

Regards,
Sender

Cleanup Regexp

Resulting body

!!([\w\s]*)!!

Hello,

This text should stay. !!!! This should also stay.

Regards,
Sender

Removing all content from a specific point onwards

To remove all content in an email starting from one point until the end of the content, you will want to use a Body Delimiter. A delimiter removes all content from its starting point. The starting point is where the delimiter regular expression first matches.

Email signatures and all trailing text can be removed through a Create/Comment Body Delimiter:

MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8


some text

-- 
This is my sig

With a Create/Comment Body Delimiter that matches the line above 'This is my sig', you can remove the signature from the final body:

Create/Comment Body Delimiter	Resulting body for create	Resulting body for comment

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

\n-- ?\n

some text

Removing old email content in replies

Things get interesting when replies to emails are handled, as there is no standard for the format of replies:

MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

this is the reply

On November 1 2014, at 14:31 Andy Brook <javahollic@gmail.com> wrote:

> blah
>
> --

The Comment Only Body Delimiter Regexps are used to match these lines (only when commenting on an issue) to identify the point at which the body should be truncated. Truncation on reply then results in just the required text being added:

Comment Only Body Delimiter	Resulting comment body

Comment Only Body Delimiter

Resulting comment body

\nOn [A-Za-z]* [0-9]+ [0-9]{4}\W at [0-9]+:[0-9]+\s?(?:AM|PM)? "?.*"? <.*@.*> wrote:

this is the reply

Handling forwarded emails

When Dealing with forwarded email things get more complicated. During create, its less likely that you would would to apply signature stripping because the first match would cause content to be truncated there and then. During comment, its worse, because the content being forwarded looks like a reply, so would be stripped as part of the reply.

The setting Forwarded Email Subject Prefixes enables configuration of nominated email subject prefixes in a comma separated list.

Example Forwarded Email Subject Prefixes configuration

Example Forwarded Email Subject Prefixes configuration
Fwd:,Fw:,WG:,Doorst:

A forwarded message such as the one below will then be processed as follows:

MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: Fwd: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

this is a forwarded message

---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>


blah

-- 
This is my sig

Create	Comment

Create

Comment

Subject will have any listed 'forward' prefixes removed from the subject (to be the summary)
Body will not be stripped

Issue Description:

this is a forwarded message

---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>

blah


-- 
This is THEIR sig

... further mail thread ...

-- 
This is MY sig

Subject is ignored, but that nature of the message being forwarded will mean:
Body will not be stripped

Issue Comment:

this is a forwarded message

---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>

blah

-- 
This is THEIR sig


... further mail thread ...
 
-- 
This is MY sig

Removing characters using the related Unicode values

Sometimes emails are processed that contain specific characters that you would not like to be included within the Issue Description/Comment. With the Use of the Global Body Cleanup Regexps or the Project Mapping Body Cleanup you are able to configure a regexp that will either Match a specific Character or a Range of Characters.

Remove a Specific Character

First you will need to identify the Escape sequence for the specific Character within the following page https://www.rapidtables.com/code/text/unicode-characters.html. For example the Escape Sequence for “¿” is “\u00BF”. Once you have the Escape Sequence for the character you will then be able to create the Regexp by adding “[“ before and “]” after the Escape Sequence.

The below example shows the Regexp that is used to remove “¿” from the below email.

Example email:

MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: admin@localhost
To: mailbox@localhost
Content-Type: text/plain; charset=UTF-8

¿ some text

Example Regex and email outcome:

Create/Comment Body Delimiter	Resulting body for create	Resulting body for comment

Create/Comment Body Delimiter	Resulting body for create	Resulting body for comment
`[\u00BF]`	`some text`	`some text`

Remove a Range of Characters

If there is a range of characters that you want to remove then this can be done by finding the Escape Sequence of the start character and last character that should be matched. This can be identified by using the following page: https://www.rapidtables.com/code/text/unicode-characters.html. Once you have the relevant Escape Sequence for the two characters you will need to define the Regex as follows: [escape1-escape2] for example [\u00A0-\u00FF].

Below is an example Regexp Range that will remove all characters that have a Escape Sequence between “\u00A0” and “\u00FF”