Title: Handling UTF-8 multi-byte characters with a MySQL database  
Owner: Mike Harrison
Creator: Mike Harrison Feb 17, 2017
Last Changed by: Mike Harrison Jan 29, 2021
Tiny Link: (useful for email) https://thepluginpeople.atlassian.net/wiki/x/OUl-CQ
Export As: Word · PDF  
Enterprise Mail Handler for Jira Data Center (JEMH) (2)
    Page: Common Problems
    Blog: Using Body Cleanup Regexps to remove 4 byte characters?
Labels
Global Labels (3)
Time Editor  
Jan 29, 2021 18:47 Mike Harrison View Changes
In November 2003, UTF-8 was restricted by RFC 3629 to match the constraints of the UTF-16 character encoding: explicitly prohibiting code points corresponding to the high and low surrogate characters removed more than 3% of the three-byte sequences, and ending at U+10FFFF removed more than 48% of the four-byte sequences and all five- and six-byte sequences.
Jan 29, 2021 13:27 Finn Gale View Changes
Jan 29, 2021 13:26 Finn Gale