The following page covers a few simple Regexps that can be used for body delimiters with an accompanying explanation of their implementation.
Date Formats
dd/mm/yy or mm/dd/yy
The following simple Regexp will match on either date formats:
[0-9]{2}/[0-9]{2}/[0-9]{2}
The [0-9]
means any value between 0 and 9. The {2}
means there should be two characters. [0-9]{2}
will therefore match any value between 00 to 99. By repeating this pattern with an added forward slash '/' we get a Regexp that matches on 00/00/00 to 99/99/99.
Email Addresses
The following Regexp can be used to represent a simple email address, such as ‘johndoe@domain.com’
[a-z]+@[a-z]+\.com
The Regexp pattern [a-z]+
will match on one or more lowercase characters, which could be used on both the username and domain section of the email.
Extra values
Additional values such as numbers/characters can be added to the regexp where necessary for example the following regexp could be used to match an address such as 'Joe.Blogs@123test123.co.uk:
[A-Za-z\.]+@[a-z0-9]+\.(com|co\.uk|gov)
The Regexp for the username ([A-Za-z\.]
) now matches on capitol letters and full stops/periods '.'. The Domain ([a-z0-9]
) also contains numbers and an option of Top Level Domains (\.com|co\.uk|gov)
UTF-8 Characters
Some characters should not be included in your regular expression (as characters) as they may interact with the expression expectantly, but may be necessary for finding a match. Characters can be expressed using a related UTF-8 code, rather than the character itself. The following regexp will match on a UTF-8 character where xxxx
is the UTF-8 code:
\x[xxxx]
Please see the following page for a comprehensive (and well organised) list of UTF-8 characters:
Codes
The following examples show characters that are likely to be used and could be replaced by their related UTF-8 reference:
Character | Code | Regexp |
---|---|---|
“ | 0022 |
|
< | 003C |
|
> | 003E |
|
' | 0027 |
|
Examples
UTF-8 Characters could be used when attempting a match on an email address if it uses Characters that could be HTML ‘unsafe’ and would usually be stripped.
The following Regexp can be used to match an email address wrapped in less/greater than symbols such as “<johndoe@domain.com>”
[\x{003C}][a-z]+@[a-z]+\.com[\x{003E}]