Example Regexp Body Delimiters
The following page covers a few simple Regexps used for body delimiters with an explanation of their implementation.
Date Formats
dd/mm/yy or mm/dd/yy
The following simple Regexp will match on either date formats:
[0-9]{2}/[0-9]{2}/[0-9]{2}
[0-9]
means any value between 0 and 9. {2}
means there should be two characters. [0-9]{2}
will therefore match any value between 00 to 99. Repeating this pattern with and adding forward slash '/' we get a Regexp that matches on 00/00/00 to 99/99/99.
Email Addresses
The following Regexp can be used to represent a simple email address, such as ‘johndoe@domain.com’
[a-z]+@[a-z]+\.com
The Regexp pattern [a-z]+
will match on one or more lowercase characters, which could be used on both the username and domain section of the email.
Extra values
Additional values such as numbers/characters can be added to the regexp where necessary. For example the following regexp could be used to match an address such as 'Joe.Blogs@123test123.co.uk:
[A-Za-z\.]+@[a-z0-9]+\.(com|co\.uk|gov)
The Regexp for the username ([A-Za-z\.]
) now matches on capital letters and full stops/periods '.'. The Domain ([a-z0-9]
) also contains numbers and an option of Top Level Domains (\.com|co\.uk|gov)
UTF-8 Characters
Some characters should not be included in your regular expression as they may interact with the expression unexpectedly, but may be necessary for finding a match. Characters can be expressed using a related UTF-8 code, rather than the character itself. The following regexp will match on a UTF-8 character where xxxx
is the UTF-8 code:
Please see the following page for a comprehensive (and well organised) list of UTF-8 characters:
Codes
The following examples show characters that are likely to be used and could be replaced by their related UTF-8 reference:
Character | Name | Code | Regexp |
---|---|---|---|
“ | QUOTATION MARK | 0022 |
|
< | LESS-THAN SIGN | 003C |
|
> | GREATER-THAN SIGN | 003E |
|
' | APOSTROPHE | 0027 |
|
| NO-BREAK SPACE | 00A0 |
|
Examples
UTF-8 characters can be used when attempting a match on an email address, if the email address uses characters that may be HTML ‘unsafe’ and would usually be stripped.
The following Regexp can be used to match an email address wrapped in less/greater than symbols such as “<johndoe@domain.com>”