Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Body content from incoming emails can be selectively removed in order to alter the resulting issue description or comment. Regular expressions are used to match parts of the processed email body (note that when creating regular expressions it is best to work with post-processed content from a created/commented issue).

Removing specific parts of email content

To remove only particular parts of an email's content, you will most likely want to use a Cleanup Regexp. A clean up will only remove content that matches the regular expression.


Example 1
Code Block
languagetext
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

Hello,

This text should stay. !!This text should be removed!! This should also stay.

Regards,
Sender


Cleanup Regexp

Resulting body


Code Block
!![\w\s]*!!


Code Block
Hello,

This text should stay. This should also stay.

Regards,
Sender



Example 2
Code Block
languagetext
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

Hello,

This text should stay. !!Keep the exclamation marks!! This should also stay.

Regards,
Sender


Cleanup Regexp

Resulting body


Code Block
!!([\w\s]*)!!


Code Block
Hello,

This text should stay. !!!! This should also stay.

Regards,
Sender



Removing all content from a specific point onwards

To remove all content in an email starting from one point until the end of the content, you will want to use a Body Delimiter. A delimiter removes all content from its starting point. The starting point is where the delimiter regular expression first matches.

Email signatures and all trailing text can be removed through a Create/Comment Body Delimiter:

Code Block
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8


some text

-- 
This is my sig

With a Create/Comment Body Delimiter that matches the line above 'This is my sig', you can remove the signature from the final body:

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment


Code Block
\n-- ?\n


Code Block
some text
Code Block
some text

Removing old email content in replies

Things get interesting when replies to emails are handled, as there is no standard for the format of replies:

Code Block
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

this is the reply

On November 1 2014, at 14:31 Andy Brook <javahollic@gmail.com> wrote:

> blah
>
> --

The Comment Only Body Delimiter Regexps are used to match these lines (only when commenting on an issue) to identify the point at which the body should be truncated.  Truncation on reply then results in just the required text being added:

Comment Only Body Delimiter

Resulting comment body


Code Block
\nOn [A-Za-z]* [0-9]+ [0-9]{4}\W at [0-9]+:[0-9]+\s?(?:AM|PM)? "?.*"? <.*@.*> wrote:


Code Block
this is the reply

Handling forwarded emails

When Dealing with forwarded email things get more complicated.  During create, its less likely that you would would to apply signature stripping because the first match would cause content to be truncated there and then.  During comment, its worse, because the content being forwarded looks like a reply, so would be stripped as part of the reply.

The setting Forwarded Email Subject Prefixes enables configuration of nominated email subject prefixes in a comma separated list.

Example Forwarded Email Subject Prefixes configuration

Fwd:,Fw:,WG:,Doorst:

A forwarded message such as the one below will then be processed as follows:

Code Block
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: Fwd: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8

this is a forwarded message

---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>


blah

-- 
This is my sig



Create

Comment

  • Subject will have any listed 'forward' prefixes removed from the subject (to be the summary)

  • Body will not be stripped

Issue Description:

Code Block
this is a forwarded message

---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>

blah


-- 
This is THEIR sig

... further mail thread ...

-- 
This is MY sig


  • Subject is ignored, but that nature of the message being forwarded will mean:

  • Body will not be stripped

Issue Comment:

Code Block
this is a forwarded message

---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>

blah

-- 
This is THEIR sig


... further mail thread ...
 
-- 
This is MY sig


Removing characters using the related Unicode values

Sometimes emails are processed that contain specific characters that you would not like to be included within the Issue Description/Comment. With the Use of the Global Body Cleanup Regexps or the Project Mapping Body Cleanup you are able to configure a regexp that will either Match a specific Character or a Range of Characters.

Remove a Specific Character

First you will need to identify the Escape sequence for the specific Character within the following page https://www.rapidtables.com/code/text/unicode-characters.html. For example the Escape Sequence for “¿” is “\u00BF”. Once you have the Escape Sequence for the character you will then be able to create the Regexp by adding “[“ before and “]” after the Escape Sequence.

The below example shows the Regexp that is used to remove “¿” from the below email.

Example email:

Code Block
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: admin@localhost
To: mailbox@localhost
Content-Type: text/plain; charset=UTF-8

¿ some text

Example Regex and email outcome:

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

Code Block
[\u00BF]
Code Block
some text
Code Block
some text

Remove a Range of Characters

If there is a range of characters that you want to remove then this can be done by finding the Escape Sequence of the start character and last character that should be matched. This can be identified by using the following page: https://www.rapidtables.com/code/text/unicode-characters.html. Once you have the relevant Escape Sequence for the two characters you will need to define the Regex as follows: [escape1-escape2] for example [\u00A0-\u00FF].

Below is an example Regexp Range that will remove all characters that have a Escape Sequence between “\u00A0” and “\u00FF”

Example email:

Code Block
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: admin@localhost
To: mailbox@localhost
Content-Type: text/plain; charset=UTF-8

¿¡¢¥¦ some text
¶ · ¸ ¹ º » ¼  ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç
Some more text after the characters that should be removed.

Example Regex and email outcome:

Create/Comment Body Delimiter

Resulting body for create

Resulting body for comment

Code Block
[\u00A0-\u00FF]+
Code Block
some text

Some more text after the characters that should be removed.
Code Block
some text

Some more text after the characters that should be removed.

Filter by label (Content by label)
page
showLabelsfalse
max5
spacesJEMH
showSpacefalse
sortmodified
showSpacetypefalsepage
reversetruetype
labelsevent listener jemh issue notification
cqllabel in ( "incoming" , "email" , "regexp" , "quotes" , "comment" , "delimeter" ) and space = "JEMH"labelsevent listener jemh issue notification
Page Properties
hiddentrue


Related issues