Update to clarify combining characters

J-F C. Morfin jfc at morfin.org
Mon Apr 21 20:06:19 CEST 2014


registries register domain names in their zone.

Some registries may want to filter the xn-- entries, others not, and 
most will most probably eventually not after courts decisions against 
technical barriers to trade (TBT) discrimination. Please note that 
nothing prevents anyone to register the "babel name": 
xn--peter-occil.tld and the corresponding TM as per 
so whenever they send a mail it prints as 
"from at xn--peter-occil.tld"). There are years that everyone know about 
it (incuding WIPO and ICANN TM people). No one wants to touch that 
problem which involves all the naming nasty issues. It is likely that 
this is not to be addressed before a long time, due to the NTIA 
distanciation from ICANN.

Therfore, IMHO we should not assume anything outside of RFCs 
consistency (people need to know how the rock solid DNS will 
proceed). In addition, the NTIA distanciation will most probably 
introduce a distanciation between Registries and ICANN, leading to a 
multi-stakeholder Registry/Ledger(group file publishers) process we 
still totally ignore the modalities. Please refer to RFC 6852: 
standards are driven by the economics of global markets and 
communities, fueled by technological advancements, and globally 
deployed regardless of their formal (IETF, UNICODE, etc.) status. 
However, RFC 6852 has not defined the apeal/MS-process to resolve 
technical and poliical conflicts.


At 18:01 21/04/2014, Peter Occil wrote:
>As suggested to me, here are the changes to the IDNA documents that 
>I hope will clarify things with combining characters:
>"  Leading Combining Characters
>The Unicode string MUST NOT begin with a combining character (as 
>defined in The Unicode Standard, Section 3.6 [Unicode])."
>"Labels whose first character is a combining character (as defined 
>in The Unicode Standard, Section 3.6 [Unicode])."
>Note that:
>- The RFC uses both "combining mark" and "combining character"; it 
>is better to use just one of these terms, since they mean virtually 
>the same thing.
>- There are two plausible definitions of a "combining mark" or 
>"combining character": a character with a non-zero canonical 
>combining class, or a character with general category of Mn, Mc, or 
>Me.  Since the term "combining character" has the latter definition 
>in the Unicode Standard and the term "combining mark" is also used, 
>I believe this is what is meant.
>- Some of the characters affected by the two definitions include 
>Indic consonant and vowel signs, variation selectors, subjoined 
>letters, and the Combining Grapheme Joiner.   All of these have 
>combining class 0 and a general category of Mn, Mc, or Me, and the 
>vast majority of them are in the PVALID category.   I'm not aware of 
>any registries that allow labels that begin with those characters.
>Idna-update mailing list
>Idna-update at alvestrand.no

More information about the Idna-update mailing list