Regex Bootcamp



Introduction



JEMH uses Regex in a very simple way, the following are key aspects that anyone using JEMH must understand.



Regex

Description

Regex

Description

.

The dot character matches any single character.

*

The asterisk matches 0 or more of the proceeding character /expression.

+

The plus matches one or more of the proceeding character / expression.

\x2c

This is a single character match that is the hex code for the literal character. See How to use comma characters in JEMH regexp's.



[a-z]



This is a range expression matching any character between a and z (lower case).



[A-Z]



This is a range expression matching any character between A and Z (Upper case).



[0-9]{10}



This is a match on a specific number of characters that in this case are 0 through 9.

.*

This is the regex wildcard, meaning 0 or more of any character.  Depending on context this could mean all content or all content on a line.





This matches 0 or more of the matching characters between a and z.





This matches any sequence of alpha numeric text regardless of case, but does not match accented characters. 





This matches characters explicitly showing how a match can be limited to one line (\n is the shortcut expression for a new line).



This matches all inputs that match the pattern some@.* with the exception of some@address.com



Capture Groups

JEMH uses Regex Capture Groups to match on a subset on an overall expression, e.g. getting the value from one line where there is also a key:





Expression

Description

Expression

Description





This matches on lines starting with Hello World with one space, a colon and another space, followed by a sequence of 6 numbers.

The round brackets indicate the capture group that can be extracted, only when the overall expression matches. 



Common applications of regex in JEMH



There are many ways you can utilise Regex within JEMH. The following are some examples 

Catch email addresses



The following are some examples of how to, and how not to use regular expressions for a catch email within JEMH.

Regex

Good/Bad

Description

Regex

Good/Bad

Description

*@domain.com

Asterisk is a regex that matches on any of the proceeding characters, as there are nothing before that asterisk, this is not valid. 

.*@domain.com

Whilst this is a valid regular expression, it also matches every recipient in domain.com, this will cause problems because JEMH filters all mailbox addresses from email processing. The catch email expression must only match mailbox addresses, the following strategies can be used to resolve this  problem: 

example@domain.com

This is a non regular expression exact match

.*-support@domain.com

Matches any mailbox with a suffix of -support



Matching replied-to content

See Use Project Mapping Cleanup and Body Delimiters for more details.

Matching replied to content requires an expression that matches the start of the line (it doesn't require a match on the full line, but a too general match will cause unexpected clipping of content). 

 All delimiter expressions are prefixed by JEMH with the new line (\n) character to limit the number of potential matches to the number of lines in the content.

Example content



Regex 

Good/Bad

Description

Regex 

Good/Bad

Description

On

This is way too general



\nOn

This requires two empty lines preceding the On, it is still to general as an expression.





This will match the majority of the lead in to the line. It is using English short names for day and month, also the year and time formats which are specific to the sending mail client.

Additional languages require new expressions, different email clients that format dates and times differently require new expressions also.