Update to clarify combining characters
J-F C. Morfin
jfc at morfin.org
Mon Apr 21 20:06:19 CEST 2014
registries register domain names in their zone.
Some registries may want to filter the xn-- entries, others not, and
most will most probably eventually not after courts decisions against
technical barriers to trade (TBT) discrimination. Please note that
nothing prevents anyone to register the "babel name":
xn--peter-occil.tld and the corresponding TM as per
so whenever they send a mail it prints as
"from at xn--peter-occil.tld"). There are years that everyone know about
it (incuding WIPO and ICANN TM people). No one wants to touch that
problem which involves all the naming nasty issues. It is likely that
this is not to be addressed before a long time, due to the NTIA
distanciation from ICANN.
Therfore, IMHO we should not assume anything outside of RFCs
consistency (people need to know how the rock solid DNS will
proceed). In addition, the NTIA distanciation will most probably
introduce a distanciation between Registries and ICANN, leading to a
multi-stakeholder Registry/Ledger(group file publishers) process we
still totally ignore the modalities. Please refer to RFC 6852:
standards are driven by the economics of global markets and
communities, fueled by technological advancements, and globally
deployed regardless of their formal (IETF, UNICODE, etc.) status.
However, RFC 6852 has not defined the apeal/MS-process to resolve
technical and poliical conflicts.
At 18:01 21/04/2014, Peter Occil wrote:
>As suggested to me, here are the changes to the IDNA documents that
>I hope will clarify things with combining characters:
>"188.8.131.52 Leading Combining Characters
>The Unicode string MUST NOT begin with a combining character (as
>defined in The Unicode Standard, Section 3.6 [Unicode])."
>"Labels whose first character is a combining character (as defined
>in The Unicode Standard, Section 3.6 [Unicode])."
>- The RFC uses both "combining mark" and "combining character"; it
>is better to use just one of these terms, since they mean virtually
>the same thing.
>- There are two plausible definitions of a "combining mark" or
>"combining character": a character with a non-zero canonical
>combining class, or a character with general category of Mn, Mc, or
>Me. Since the term "combining character" has the latter definition
>in the Unicode Standard and the term "combining mark" is also used,
>I believe this is what is meant.
>- Some of the characters affected by the two definitions include
>Indic consonant and vowel signs, variation selectors, subjoined
>letters, and the Combining Grapheme Joiner. All of these have
>combining class 0 and a general category of Mn, Mc, or Me, and the
>vast majority of them are in the PVALID category. I'm not aware of
>any registries that allow labels that begin with those characters.
>Idna-update mailing list
>Idna-update at alvestrand.no
More information about the Idna-update