Update to clarify combining characters
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Tue Apr 22 12:35:57 CEST 2014
On 2014/04/22 17:10, Cary Karp wrote:
> Quoting Eric,
>> ... in Abenaki we use several ASCII character sequences
>> inter-changeably ("ou", "w" and "8") as well as an "u atop o" character
>> defined in one or more extensions to ASCII, which typewritters with
>> half-height settings, and the character "8" have accommodated over the
>> past century, in support of a local (to a zone) semantic, e.g.,
>> equivalency of two labels, e.g., "ou.example" and "8.example" (or
>> "wabanaki.example" and "8abanaki.example" and "ouabanaki.example"),
> Are there similar non-ASCII examples?
The case of simplified vs. traditional Chinese characters has been
discussed at length (at length even for IDNA) leading up to IDNA 2003.
There are many other scripts and languages where things like this can
occur. English as used around the world is one of them; if you want
color.example, you may want colour.example at the same time.
>> Obviously, what ICANN gTLD registry operators do is governed by contacts
>> between they and ICANN, and what ccTLD registry operators is also
>> governed, in part, by desires for consistency, but below (or outside) of
>> these namespaces with _local_ (not pervasive to all levels of the tree)
>> restrictions on labels, what resolves is a local question -- local in
>> the sense of both the FQDN, the RRSet associated, and the resolvers to
>> which query(s) are made.
> Does this suggest that there are language communities with need to have
> such intricacy accommodated on lower levels of the gTLD/ccTLD namespace,
> who are willing to forgo the possibility of manifesting their languages
> directly in TLD labels?
In TLDs, it might actually be easier to deal with such a case, because
they can be dealt with on a one-by-one base. For an actual example,
please see .中国 and .中國
http://www.iana.org/domains/root/db/xn--fiqz9s.html). Of course, it's
not easy (in case it is desired) to
keep such equivalents in sync (whatever "in sync" may mean).
> Variation in keyboard practice otherwise appears in many contexts but it
> is difficult to see how this can be weighed into the IDNA protocol. My
> Swedish keyboard has separate keys for the direct entry of the last
> three letters of the Swedish alphabet (å ä ö). These can, however, also
> be typed by using the "dead key" that is necessary for the other
> diacritically marked letters used in written Swedish. That method
> requires the mark to be entered first but it neither displays nor
> spaces. The letter with which it combines is then entered and the
> corresponding pre-composed single-code point character is displayed.
> I had always assumed that the trailing order of combining marks was
> imposed directly by Unicode and that this simply cascaded into IDNA.
True. Keyboard input uses keystrokes (represented internally by
so-called keycodes), and whether diacritics are entered before or after
the base letter isn't indicative of what characters end up in the data.
> that constraint actually be overridden in any situation that would be
> trapped by a new contextual rule in 5892?
No. A diacritic followed by a base character would mean that the
diacritic is displayed over the previous character. Also, IDNA requires
NFC, which is many cases (including the Swedish ones) combines the mark
with the base character.
> (If new rules are going to be
> added, there are a few others that might be suggested. Is that topic now
> open for discussion?)
I don't think so.
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update