Example Regexp Body Delimiters

The following page covers a few simple Regexps used for body delimiters with an explanation of their implementation.

Date Formats

dd/mm/yy or mm/dd/yy

The following simple Regexp will match on either date formats:

[0-9]{2}/[0-9]{2}/[0-9]{2}

[0-9] means any value between 0 and 9. {2} means there should be two characters. [0-9]{2} will therefore match any value between 00 to 99. Repeating this pattern with and adding forward slash '/' we get a Regexp that matches on 00/00/00 to 99/99/99.

Email Addresses

The following Regexp can be used to represent a simple email address, such as ‘johndoe@domain.com’

[a-z]+@[a-z]+\.com

The Regexp pattern [a-z]+ will match on one or more lowercase characters, which could be used on both the username and domain section of the email.

Extra values

Additional values such as numbers/characters can be added to the regexp where necessary. For example the following regexp could be used to match an address such as 'Joe.Blogs@123test123.co.uk:

[A-Za-z\.]+@[a-z0-9]+\.(com|co\.uk|gov)

The Regexp for the username ([A-Za-z\.]) now matches on capital letters and full stops/periods '.'. The Domain ([a-z0-9]) also contains numbers and an option of Top Level Domains (\.com|co\.uk|gov)

UTF-8 Characters

Some characters should not be included in your regular expression as they may interact with the expression unexpectedly, but may be necessary for finding a match. Characters can be expressed using a related UTF-8 code, rather than the character itself. The following regexp will match on a UTF-8 character where xxxx is the UTF-8 code:

Please see the following page for a comprehensive (and well organised) list of UTF-8 characters:

Codes

The following examples show characters that are likely to be used and could be replaced by their related UTF-8 reference:

Character

Name

Code

Regexp

Character

Name

Code

Regexp

QUOTATION MARK

0022

[\x{0022}]

<

LESS-THAN SIGN

003C

[\x{003C}]

>

GREATER-THAN SIGN

003E

[\x{003E}]

'

APOSTROPHE

0027

[\x{0027}]

 

NO-BREAK SPACE

00A0

[\x{00A0}]

Examples

UTF-8 characters can be used when attempting a match on an email address, if the email address uses characters that may be HTML ‘unsafe’ and would usually be stripped.

The following Regexp can be used to match an email address wrapped in less/greater than symbols such as “<johndoe@domain.com>”