What processes are there for incoming email attachments

Background on how JEMH deals with attachments

Discovery

JEMH uses Atassian Mail to extract Attachments as a first pass.  After this, JEMH does another sweep to locate more, any found are checked for duplication by MD5 fingerprint from the first list.  Any 'new' attachments are combined.

Some attachments are not 'added' to the email but 'embedded';  HTML Content can embed images in two ways, either by referencing a mime-encoded part within the email, or by in-lining the encoded content (usually images), the latter may not have a filename, in which case they are generated.

Text/plain parts are checked to see if they match what was already extracted as the basis of the Comment or Description.  Matches with line 1 are discarded as that part probably was the comment/description text/plain source.  New files are added with generated filenames: sometext-[x].txt where x is unique only within the email.

Likewise Text/html parts are also checked, after using HTML-> Text conversion for a match on the Comment or Description.  Matches with line 1 are discarded.  New files are added with generated filenames: somehtml-[x].html where x is unique within the html, and includes added text/plain parts.

The contentType is checked for being an image, if so, is created as image-[x].[extension] where x is unique within the html and includes text and html parts, the extension part is taken from the mime-type.

Unknown content with a mime-type startingh with 'application' are given a filename unknown.bin.

Filtration

There are several kinds of filtration applied:

  1. Signed email will include a signature.asc file, this for example is dropped.

  2. Attachment candidates are also checked against the profile, which allows for blacklisting by regexp (usually for file-type blacklisting, e.g. .mov, .mp3), any matches are dropped.

  3. The Global Blacklisting support uses MD5 fingerprints for identical file matching (e.g. images in email signatures), any matches to a blacklist entry are dropped.   Any matches on a blacklist item, cause that blacklist items hit count meta data to be updated.

  4. Pre-existing file duplication filtering, using a MD5 fingerprint to check every existing attachment.  Duplicates are dropped.

Additions

JEMH Profiles can configure the addition of several kinds of attachments:

  1. The entire email message, including all encoded attachment data, can be attached to the issue for reference.  These attachments are stored as message/rfc822 mime-type with a .eml extension.  Some clients (outlook, will load these files, after changing this to .msg)

  2. The original body content that was used for the creation of the Description or Comment can also be attached for reference.  This makes most sense for HTML messages, and allows for HTML layout to be completely retained for reference.  Fillenames are jemh-creating-email.[html|txt] for creation, and jemh-commenting-email-[x].[html|txt]  (x is the index of the comment).

Filename Duplication Management

JIRA allows creation of duplicate filenames, these are treated as new versions of the same resource.  Unfortunately, the wiki markup is only able to refer to the base filename, not version so created images must be unique in the issue:

Inlining Image consequences

When JEMH image in-lining is enabled, in order to retain comment links to added images, JEMH must rename all images to ensure name uniqueness.

So, when inline images are not enabled, JEMH will store the files as-is, and JIRA will treat them as versioned artifacts.  If inline images are enabled, JEMH will rewrite the filenames as follows (assuming non hash match):

Existing Issue Attachment Names

Email Attachment Names

Post-processing JEMH attachment Name

Existing Issue Attachment Names

Email Attachment Names

Post-processing JEMH attachment Name

 

image.png

image.png

image.png

image.png

image-1.png

image.png, image-1.png, image-2.png

image.png

image-3.png

 

The inline-image support requires exact file-name wiki links to be present in the body of a created Description or Comment, placeholder text in the body are updated to reflect current file-names.  This is done in sequence of images found, blacklisted files and/or additional content found could throw off the wiki links image references.  A work in progress.