Remove Signature Text and Images

Scenario

You have an email that contains a body, and a signature, potentially with images if the message is HTML, for example:

When commenting, this information is largely redundant, and needs to be stripped.  The following guide will show how to strip the Text as well as the Images.

Stripping Text

JEMH 1.3.20+ has ability to make a Regular Expression (or Regexp) 'cut' in the email at the point of matches.  Plain text emails are straight forward, HTML is a little more involved.  When processing HTML, a conversion process is run, that strips the HTML nature from the email, and leaves just Text.  JEMH has several HTML processing 'engines', work in progress is introducing support for wiki mark-up of various HTML tags, such as tables, even in-line images.  It's important to 'see' the converted content from the selected HTML-Text conversion as it is this content that the Stripping will use a regular expression against.

Get an email into JIRA, either by Creating a Test Case or by using full email delivery.  The resulting body text would look something like:

1 2 3 4 5 6 7 8 == Created by JEMH via e-mail from: "Andy Brook" <andy@thepluginpeople.com> == Its a nice sig, but its got to go! -- The Plugin People Ltd (registered in UK #08404380) forums: https://www.getsatisfaction.com/thepluginpeople support: https://thepluginpeople.atlassian.net tel: +44 (791) 468-3169 !^image.png|align=center, vspace=4, border=2!

Using the JEMH integrated Regexp Tester, copy/paste the above content into the bottom window of the tester.  Make sure that the 'Dot matches all' option (under the Regex Options section, top right) is enabled in order to match JEMH's regexp behaviour (newer versions should have this as default).  Now, we set up a regexp to match the start of the content that we want to be removed.:

1 \n-- ?\n

You will see that the matching content is highlighted in yellow:

Enter the regexp in the profile, under Create/Comment Body Delimiter Regexps.  Multiple can be added if they are separated with a comma.  Alternatively, if you want the removal to only occur during commenting, this is be entered for Comment Only Body Delimiter Regexps instead.

Multiple Regexps

You will most likely want to have more than one regexp specified.  This is done by separating the regexps with commas.  For example:



Alternative method for signature removal : Body Cleanup Regexps

Body Cleanup Regexps can be used to surgically remove specific signature matches from body content, rather than identifying a cut-off point for wanted text content.

Example

Referring to our original example signature, the below regexp would remove all matches from body content. In this scenario, this would remove all content from the "–" to the end of the phone number.

1 \n-- ?\n.*468-3169

Of course, care needs to be taken with these regexps, as if "–" appeared again before this signature, it would also match from there down to the phone number.  This is due to the "dot matches all" behaviour that is used for JEMH's regexps, which means that "." also matches new lines.  One workaround is using more content in the regexp, as this reduces the chance of false positives.  Another workaround would be to use negated sets to make sure that only our desired match is made:

1 \n-- ?\n[^\n]+\n[^\n]+\n[^\n]+[^\n]+\n[^\n]+468-3169

Stripping Images

JEMH won't create many copies of the same image, all attachments are checked for uniqueness and only added if different.  Even then, removing specific files such as email signature images can be done using Blacklisting (JEMH>Blacklisting), see Use Blacklisting.

Results

With the image blacklisted, it won't appear again, either during creation or comment.

Issue Creation

During issue creation, no attachment is added, as shown.  The Signature is present as there is no Create/Comment body delimiter regexp defined.

Commenting

When commenting, the Comment Only Body Delimiter Regexp is applied, this strips the signature resulting in:

Job done!



Related articles