Remove content from the processed email body
Body content from incoming emails can be selectively removed in order to alter the resulting issue description or comment. Regular expressions are used to match parts of the processed email body (note that when creating regular expressions it is best to work with post-processed content from a created/commented issue).
Removing specific parts of email content
To remove only particular parts of an email's content, you will most likely want to use a Cleanup Regexp. A clean up will only remove content that matches the regular expression.
Example 1
MIME-Version: 1.0
Received: by 10.223.112.12 with HTTP; Sat, 18 Jun 2011 22:42:26 -0700 (PDT)
Date: Sun, 19 Jun 2011 17:42:26 +1200
Message-ID: <BANLkTinB1mfSh+GwOXGNWoL4SyDvOpdBoQ@mail.gmail.com>
Subject: This is a starting email template, update as required
From: "Andy Brook" <andy@localhost>
To: changeme@thiswontwork.com
Content-Type: text/plain; charset=UTF-8
Hello,
This text should stay. !!This text should be removed!! This should also stay.
Regards,
Sender
Cleanup Regexp | Resulting body |
---|---|
!![\w\s]*!! | Hello,
This text should stay. This should also stay.
Regards,
Sender |
Example 2
Cleanup Regexp | Resulting body |
---|---|
Removing all content from a specific point onwards
To remove all content in an email starting from one point until the end of the content, you will want to use a Body Delimiter. A delimiter removes all content from its starting point. The starting point is where the delimiter regular expression first matches.
Email signatures and all trailing text can be removed through a Create/Comment Body Delimiter:
With a Create/Comment Body Delimiter that matches the line above 'This is my sig', you can remove the signature from the final body:
Create/Comment Body Delimiter | Resulting body for create | Resulting body for comment |
---|---|---|
Removing old email content in replies
Things get interesting when replies to emails are handled, as there is no standard for the format of replies:
The Comment Only Body Delimiter Regexps are used to match these lines (only when commenting on an issue) to identify the point at which the body should be truncated. Truncation on reply then results in just the required text being added:
Comment Only Body Delimiter | Resulting comment body |
---|---|
Handling forwarded emails
When Dealing with forwarded email things get more complicated. During create, its less likely that you would would to apply signature stripping because the first match would cause content to be truncated there and then. During comment, its worse, because the content being forwarded looks like a reply, so would be stripped as part of the reply.
The setting Forwarded Email Subject Prefixes enables configuration of nominated email subject prefixes in a comma separated list.
Example Forwarded Email Subject Prefixes configuration |
---|
Fwd:,Fw:,WG:,Doorst: |
A forwarded message such as the one below will then be processed as follows:
Create | Comment |
---|---|
Issue Description: |
Issue Comment: |
Removing characters using the related Unicode values
Sometimes emails are processed that contain specific characters that you would not like to be included within the Issue Description/Comment. With the Use of the Global Body Cleanup Regexps or the Project Mapping Body Cleanup you are able to configure a regexp that will either Match a specific Character or a Range of Characters.
Remove a Specific Character
First you will need to identify the Escape sequence for the specific Character within the following page Unicode characters table. For example the Escape Sequence for “¿” is “\u00BF”. Once you have the Escape Sequence for the character you will then be able to create the Regexp by adding “[“ before and “]” after the Escape Sequence.
The below example shows the Regexp that is used to remove “¿” from the below email.
Example email:
Example Regex and email outcome:
Create/Comment Body Delimiter | Resulting body for create | Resulting body for comment |
---|---|---|
Remove a Range of Characters
If there is a range of characters that you want to remove then this can be done by finding the Escape Sequence of the start character and last character that should be matched. This can be identified by using the following page: Unicode characters table. Once you have the relevant Escape Sequence for the two characters you will need to define the Regex as follows: [escape1-escape2]
for example [\u00A0-\u00FF]
.
Below is an example Regexp Range that will remove all characters that have a Escape Sequence between “\u00A0” and “\u00FF”
Example email:
Example Regex and email outcome:
Create/Comment Body Delimiter | Resulting body for create | Resulting body for comment |
---|---|---|