Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The following page covers a few simple Regexps used for body delimiters with an explanation of their implementation.

Date Formats

dd/mm/yy or mm/dd/yy

The following simple Regexp will match on either date formats:

...

[0-9] means any value between 0 and 9. {2} means there should be two characters. [0-9]{2} will therefore match any value between 00 to 99. Repeating this pattern with and adding forward slash '/' we get a Regexp that matches on 00/00/00 to 99/99/99.

Email Addresses

The following Regexp can be used to represent a simple email address, such as ‘johndoe@domain.com’

...

The Regexp pattern [a-z]+ will match on one or more lowercase characters, which could be used on both the username and domain section of the email.

Extra values

Additional values such as numbers/characters can be added to the regexp where necessary. For example the following regexp could be used to match an address such as 'Joe.Blogs@123test123.co.uk:

...

The Regexp for the username ([A-Za-z\.]) now matches on capital letters and full stops/periods '.'. The Domain ([a-z0-9]) also contains numbers and an option of Top Level Domains (\.com|co\.uk|gov)

UTF-8 Characters

Some characters should not be included in your regular expression as they may interact with the expression unexpectedly, but may be necessary for finding a match. Characters can be expressed using a related UTF-8 code, rather than the character itself. The following regexp will match on a UTF-8 character where xxxx is the UTF-8 code:

...

Codes

The following examples show characters that are likely to be used and could be replaced by their related UTF-8 reference:

Character

Code

Regexp

0022

[\x{0022}]

<

003C

[\x{003C}]

>

003E

[\x{003E}]

'

0027

[\x{0027}]

Examples

UTF-8 characters can be used when attempting a match on an email address, if the email address uses characters that may be HTML ‘unsafe’ and would usually be stripped.

...