Update to clarify combining characters

Peter Occil poccil14 at gmail.com
Mon Apr 21 18:01:21 CEST 2014


As suggested to me, here are the changes to the IDNA documents that I hope will clarify things with combining characters:

"4.2.3.2  Leading Combining Characters

The Unicode string MUST NOT begin with a combining character (as defined in The Unicode Standard, Section 3.6 [Unicode])."

"Labels whose first character is a combining character (as defined in The Unicode Standard, Section 3.6 [Unicode])."

Note that:

- The RFC uses both "combining mark" and "combining character"; it is better to use just one of these terms, since they mean virtually the same thing.
- There are two plausible definitions of a "combining mark" or "combining character": a character with a non-zero canonical combining class, or a character with general category of Mn, Mc, or Me.  Since the term "combining character" has the latter definition in the Unicode Standard and the term "combining mark" is also used, I believe this is what is meant.
- Some of the characters affected by the two definitions include Indic consonant and vowel signs, variation selectors, subjoined letters, and the Combining Grapheme Joiner.   All of these have combining class 0 and a general category of Mn, Mc, or Me, and the vast majority of them are in the PVALID category.   I'm not aware of any registries that allow labels that begin with those characters.

--Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20140421/14b3e58c/attachment.html>


More information about the Idna-update mailing list