The following page covers a few simple Regexps used for body delimiters with an explanation of their implementation.
Date Formats
dd/mm/yy or mm/dd/yy
The following simple Regexp will match on either date formats:
...
[0-9]
means any value between 0 and 9. {2}
means there should be two characters. [0-9]{2}
will therefore match any value between 00 to 99. Repeating this pattern with and adding forward slash '/' we get a Regexp that matches on 00/00/00 to 99/99/99.
Email Addresses
The following Regexp can be used to represent a simple email address, such as ‘johndoe@domain.com’
...
The Regexp pattern [a-z]+
will match on one or more lowercase characters, which could be used on both the username and domain section of the email.
Extra values
Additional values such as numbers/characters can be added to the regexp where necessary. For example the following regexp could be used to match an address such as 'Joe.Blogs@123test123.co.uk:
...
The Regexp for the username ([A-Za-z\.]
) now matches on capital letters and full stops/periods '.'. The Domain ([a-z0-9]
) also contains numbers and an option of Top Level Domains (\.com|co\.uk|gov)
UTF-8 Characters
Some characters should not be included in your regular expression as they may interact with the expression unexpectedly, but may be necessary for finding a match. Characters can be expressed using a related UTF-8 code, rather than the character itself. The following regexp will match on a UTF-8 character where xxxx
is the UTF-8 code:
...
Codes
The following examples show characters that are likely to be used and could be replaced by their related UTF-8 reference:
Character | Code | Regexp |
---|---|---|
“ | 0022 |
|
< | 003C |
|
> | 003E |
|
' | 0027 |
|
Examples
UTF-8 characters can be used when attempting a match on an email address, if the email address uses characters that may be HTML ‘unsafe’ and would usually be stripped.
...