$customHeader
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

The following page covers a few simple Regexps that can be used for body delimiters with an accompanying explanation of their implementation.

Date Formats

dd/mm/yy or mm/dd/yy

The following simple Regexp will match on either date formats:

[0-9]{2}/[0-9]{2}/[0-9]{2}

The [0-9] means any value between 0 and 9. The {2} means there should be two characters. [0-9]{2} will therefore match any value between 00 to 99. By repeating this pattern with an added forward slash '/' we get a Regexp that matches on 00/00/00 to 99/99/99.

Email Addresses

The following Regexp can be used to represent a simple email address, such as ‘johndoe@domain.com’

[a-z]+@[a-z]+\.com

The Regexp pattern [a-z]+ will match on one or more lowercase characters, which could be used on both the username and domain section of the email.

Extra values

Additional values such as numbers/characters can be added to the regexp where necessary for example the following regexp could be used to match an address such as 'Joe.Blogs@123test123.co.uk:

[A-Za-z\.]+@[a-z0-9]+\.(com|co\.uk|gov)

The Regexp for the username ([A-Za-z\.]) now matches on capitol letters and full stops/periods '.'. The Domain ([a-z0-9]) also contains numbers and an option of Top Level Domains (\.com|co\.uk|gov)

UTF-8 Characters

Some characters should not be included in your regular expression (as characters) as they may interact with the expression expectantly, but may be necessary for finding a match. Characters can be expressed using a related UTF-8 code, rather than the character itself. The following regexp will match on a UTF-8 character where xxxx is the UTF-8 code:

\x[xxxx]

Please see the following page for a comprehensive (and well organised) list of UTF-8 characters:

Codes

The following examples show characters that are likely to be used and could be replaced by their related UTF-8 reference:

Character

Code

Regexp

0022

[\x{0022}]

<

003C

[\x{003C}]

>

003E

[\x{003E}]

'

0027

[\x{0027}]

Examples

UTF-8 Characters could be used when attempting a match on an email address if it uses Characters that could be HTML ‘unsafe’ and would usually be stripped.

The following Regexp can be used to match an email address wrapped in less/greater than symbols such as “<johndoe@domain.com>”

[\x{003C}][a-z]+@[a-z]+\.com[\x{003E}]
  • No labels