Update to clarify combining characters

"Martin J. Dürst" duerst at it.aoyama.ac.jp
Fri Apr 25 12:16:31 CEST 2014


Hello John, others,

On 2014/04/23 05:15, John C Klensin wrote:

> --On Tuesday, April 22, 2014 09:38 -0700 Eric Brunner-Williams
> <ebw at abenaki.wabanaki.net> wrote:

>> Values assigned outside the ASCII range for the "u-above-o"
>> combined character in the UTC repertoire are  U+0222 and
>> U+0223, reflecting the casing of Latin Script.
>
> So, as characters that can be used in labels (see below for some
> other issues), there are actually precomposed characters for the
> above.

Yes.

> For use of characters with precombined forms in DNS
> labels, it is important that the IDNA requirement for NFC be
> applied carefully (that requirement essentially eliminates
> leading combining characters or marks)

Actually, it doesn't. A string that starts with a leading combining mark 
is still in NFC, assuming that the remainder of the string is in NFC. 
Something else in IDNA may specify that leading combining marks are not 
allowed, but NFC doesn't.


> There are problems that the IETF could not solve even if there
> were the will to do so.   One involves decisions by the Unicode
> community that are unattractive for particular scripts.  In my
> experience, while I'd be very interested in counter-examples,
> there are few such problems with Latin-based characters unless
> one gets to characters that require multiple decorations and
> that can potentially be written as a base (i.e., undecorated and
> typically ASCII) character plus two (or more) combining
> characters or a precombined character plus one (or more) of
> them.  Because some of those combinations appear to not be
> resolved into a single form by normalization, there might be an
> opportunity for "variant" consideration  except that ICANN, in
> its wisdom (and unless things have changed recently), decided
> that there is no such thing as a variant for Latin-based scripts.

This agrees with my understanding that for Latin, normalization (i.e. 
NFC) deals with these problems, even in those cases where multiple 
'decorations' (diacritics) are involved.


> Variants are also out of IETF scope, at least for IDNA, because
> doing anything about them in anything resembling a general case
> turns into a set of issues that cannot be handled in the DNS
> except by externally treating names as equivalent.  As Andrew
> has mentioned, there have been extensive ICANN efforts to deal
> with a set of problems they have lumped together under that
> term; it may be of note that there does not seem to be a single
> period with experience using endangered languages or writing
> systems in a DNS context in the relevant decision-making
> committees.

'period' -> 'person' ?


> Finally, to respond to Martin's comment about simplified and
> traditional Chinese, that problem is very different from those
> associated with other, especially "alphabetic-phonetic",
> scripts, in part because those of us who did the final editing
> on the JET document that put "variants" on the map made a
> serious error in terminology.  But, again, it isn't a topic for
> this list.

Can you (slightly) expand on "serious error in terminology", or provide 
a pointer?

Regards,   Martin.


More information about the Idna-update mailing list