Remove content from the processed email body
Body content from incoming emails can be selectively removed in order to alter the resulting issue description or comment. Regular expressions are used to match parts of the processed email body (note that when creating regular expressions it is best to work with post-processed content from a created/commented issue).
Removing specific parts of email content
To remove only particular parts of an email's content, you will most likely want to use a Cleanup Regexp. A clean up will only remove content that matches the regular expression.
Example 1
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8
Hello,
This text should stay. !!This text should be removed!! This should also stay.
Regards,
Sender
Cleanup Regexp | Resulting body |
---|---|
!![\w\s]*!! | Hello,
This text should stay. This should also stay.
Regards,
Sender |
Example 2
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8
Hello,
This text should stay. !!Keep the exclamation marks!! This should also stay.
Regards,
Sender
Cleanup Regexp | Resulting body |
---|---|
!!([\w\s]*)!! | Hello,
This text should stay. !!!! This should also stay.
Regards,
Sender |
Removing all content from a specific point onwards
To remove all content in an email starting from one point until the end of the content, you will want to use a Body Delimiter. A delimiter removes all content from its starting point. The starting point is where the delimiter regular expression first matches.
Email signatures and all trailing text can be removed through a Create/Comment Body Delimiter:
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8
some text
--
This is my sig
With a Create/Comment Body Delimiter that matches the line above 'This is my sig', you can remove the signature from the final body:
Create/Comment Body Delimiter | Resulting body for create | Resulting body for comment |
---|---|---|
\n-- ?\n | some text | some text |
Removing old email content in replies
Things get interesting when replies to emails are handled, as there is no standard for the format of replies:
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8
this is the reply
On November 1 2014, at 14:31 Andy Brook <javahollic@gmail.com> wrote:
> blah
>
> --
The Comment Only Body Delimiter Regexps are used to match these lines (only when commenting on an issue) to identify the point at which the body should be truncated. Truncation on reply then results in just the required text being added:
Comment Only Body Delimiter | Resulting comment body |
---|---|
\nOn [A-Za-z]* [0-9]+ [0-9]{4}\W at [0-9]+:[0-9]+\s?(?:AM|PM)? "?.*"? <.*@.*> wrote: | this is the reply |
Handling forwarded emails
When Dealing with forwarded email things get more complicated. During create, its less likely that you would would to apply signature stripping because the first match would cause content to be truncated there and then. During comment, its worse, because the content being forwarded looks like a reply, so would be stripped as part of the reply.
The setting Forwarded Email Subject Prefixes enables configuration of nominated email subject prefixes in a comma separated list.
Example Forwarded Email Subject Prefixes configuration |
---|
Fwd:,Fw:,WG:,Doorst: |
A forwarded message such as the one below will then be processed as follows:
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: Fwd: This is a starting email template, update as required
From: "Andy Brook" <andy@test.com>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8
this is a forwarded message
---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>
blah
--
This is my sig
Create | Comment |
---|---|
Issue Description: this is a forwarded message
---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>
blah
--
This is THEIR sig
... further mail thread ...
--
This is MY sig |
Issue Comment: this is a forwarded message
---------- Forwarded message ----------
From: Andy Brook <javahollic@gmail.com>
Date: 1 November 2014 14:31
Subject: hello world
To: Andy Brook <javahollic@gmail.com>
blah
--
This is THEIR sig
... further mail thread ...
--
This is MY sig |
Removing characters using the related Unicode values
Sometimes emails are processed that contain specific characters that you would not like to be included within the Issue Description/Comment. With the Use of the Global Body Cleanup Regexps or the Project Mapping Body Cleanup you are able to configure a regexp that will either Match a specific Character or a Range of Characters.
Remove a Specific Character
First you will need to identify the Escape sequence for the specific Character within the following page Unicode characters table. For example the Escape Sequence for “¿” is “\u00BF”. Once you have the Escape Sequence for the character you will then be able to create the Regexp by adding “[“ before and “]” after the Escape Sequence.
The below example shows the Regexp that is used to remove “¿” from the below email.
Example email:
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: admin@localhost
To: mailbox@localhost
Content-Type: text/plain; charset=UTF-8
¿ some text
Example Regex and email outcome:
Create/Comment Body Delimiter | Resulting body for create | Resulting body for comment |
---|---|---|
[\u00BF] | some text | some text |
Remove a Range of Characters
If there is a range of characters that you want to remove then this can be done by finding the Escape Sequence of the start character and last character that should be matched. This can be identified by using the following page: Unicode characters table. Once you have the relevant Escape Sequence for the two characters you will need to define the Regex as follows: [escape1-escape2]
for example [\u00A0-\u00FF]
.
Below is an example Regexp Range that will remove all characters that have a Escape Sequence between “\u00A0” and “\u00FF”
Example email:
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: admin@localhost
To: mailbox@localhost
Content-Type: text/plain; charset=UTF-8
¿¡¢¥¦ some text
¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç
Some more text after the characters that should be removed.
Example Regex and email outcome:
Create/Comment Body Delimiter | Resulting body for create | Resulting body for comment |
---|---|---|
[\u00A0-\u00FF]+ | some text
Some more text after the characters that should be removed. | some text
Some more text after the characters that should be removed. |