Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Summary

The standard UTF-8 character set used by MySQL databases (called utf8) does not truly support all UTF-8 characters - it can only use a maximum of 3 bytes per character, leaving out 4lager multi-byte characters (😕 for example).

Both Atlassian and ourselves recommend using PostgreSQL, an SQL database that fully supports multi-byte characters. According to Atlassian, as of JIRA 7.3 they also support MySQL 5.7, which should apparently work with the utf8mb4 character set.  However if upgrading JIRA and MySQL is not possible, there are some things that can be done using JEMH to alleviate the problems.

Cleaning email subjects using the "MySQL Subject Cleaner" pre-processing task

One of JEMH's great features is its modular pre-processing task system.  Particular email processing problems can be overcome by enabling specific tasks to run before the main email processing begins.

...

  1. Go to the "Auditing" tab in JEMH and expand the "Auditing Enablement" section if it is not already

  2. Click "Enable" to enable JEMH auditing

  3. Go to your JEMH profile and edit the "Email" configuration section

  4. Under the "Pre-processing" section, enable "Use Reprocessed Message"

  5. Select the "MySQL Subject Cleaner" task from the "Pre Processing Tasks" list.  If for some reason you need more than one task enabled, control+click to select multiple.  Note that processing tasks should only be enabled if you are sure they are needed!

  6. Save changes by clicking "Submit" at the bottom of the page

Cleaning email body content using a Body Cleanup Regular Expression

If your JIRA is running on MySQL, 4multi-byte characters in the email body could also be a problem.  JIRA will try to save the content as the description or a comment, and may fail if 4multi-byte characters are present and unsupported.  If you suspect this to be the case, you can use the Body Cleanup Regexps setting found under Profile>Email to cut out these characters, allowing successful processing.

In Java, the UTF-16 representation is used for characters.  Long story short, this means that 4multi-byte characters are represented with a pair of "surrogate" values.  We can use a regular expression to match these values:

...