Remove signature text and images from emails

Scenario

You have an email that contains a body, and a signature, potentially with images if the message is HTML, for example:

When commenting, this information is largely redundant, and needs to be stripped.  The following guide will show how to strip the Text as well as the Images.

Stripping Text

JEMH 1.3.20+ has ability to make a Regular Expression (or Regexp) to 'cut' in the email at the point of matches.  Plain text emails are straight forward, HTML is a little more involved.  When processing HTML, a conversion process is run, that strips the HTML nature from the email, and leaves just Text.  JEMH has several HTML processing 'engines', work in progress is introducing support for wiki mark-up of various HTML tags, such as tables, even in-line images.  It's important to 'see' the converted content from the selected HTML-Text conversion as it is this content that the Stripping will use a regular expression against.

Get an email into JIRA, either by Creating a Test Case.

== Created by JEMH via e-mail from: "Andy Brook" <andy@thepluginpeople.com> == Its a nice sig, but its got to go! -- The Plugin People Ltd (registered in UK #08404380) forums: https://www.getsatisfaction.com/thepluginpeople support: https://thepluginpeople.atlassian.net tel: +44 (791) 468-3169

For the above example we want to remove the content after “--". The below Regex will match on this and will remove any following content.

\n-- ?\n

This Regex would then be applied to Profile > Project Mapping > Pre-Processing > Body Delimiters > Add. Here you will also be able to select whether Regex should be applied on Create, Comment or Both.

Once the Regex has been added you will then be able to use the Dynamic Evaluation tool (found on the same page) to test whether the Regex successfully removes the matching content by selecting a Test Case to test against. If configured correctly you will see that the content has been successfully removed.

Alternative method for signature removal : Body Cleanup Regexps

Body Cleanup Regexps can be used to surgically remove specific signature matches from body content, rather than identifying a cut-off point for wanted text content.

Example

Referring to our original example signature, the below regexp would remove all matches from body content. In this scenario, this would remove all content from the "–" to the end of the phone number.

\n-- ?\n.*468-3169

Of course, care needs to be taken with these regexps, as if "–" appeared again before this signature, it would also match from there down to the phone number.  This is due to the "dot matches all" behaviour that is used for JEMH's regexps, which means that "." also matches new lines.  One workaround is using more content in the regexp, as this reduces the chance of false positives.  Another workaround would be to use negated sets to make sure that only our desired match is made:

Stripping Images

JEMH won't create many copies of the same image, all attachments are checked for uniqueness and only added if different.  Even then, removing specific files such as email signature images can be done using Blocklisting (JEMH>Blocklisting), see Use Blocklisting.

Results

With the image blacklisted, it won't appear again, either during creation or comment.

Issue Creation

During issue creation, no attachment is added, as shown.  The Signature is present as there is no Create/Comment body delimiter regexp defined.

Commenting

When commenting, the Comment Only Body Delimiter Regexp is applied, this strips the signature resulting in:

Job done!



Related articles