Update of RFC 2606 based on the recent ICANN changes?

Lyman Chapin lyman at acm.org
Thu Jul 3 22:05:14 CEST 2008


>> Is "сом" identical to "com"? (the first of these is U+0441
>> U+043E U+043C)
>
> The current principle is that it should be be a "confusing string",
> which is vague enough to cover the case above (but perhaps not able to
> cover .co)

"Similarity" can be defined and tested, by setting thresholds and the  
like, but "confusing" refers to a state of mind - something is  
"confusing" if the people who are likely to encounter it consider it  
to be confusing. There's no way to objectively define or test for  
"confusing" similarity without reference to how actual people respond  
to a particular string. That means either mining data collected from  
circumstances in which people have mistaken one string for another  
(perhaps a history of Google searches), or consulting a panel of real  
people whenever it is necessary to decide whether or not two strings  
are "confusingly" similar.

>>> (b) be identical to a Reserved Name;
>>
>>> (c) consist of a single character;
>>
>> I've heard it argued repeatedly that this is an unreasonable
>> rule for ideographic characters.   I don't have an opinion, but
>> hope that ICANN has considered that case in full details.
>
> This is where we dive into a discussion what is a "character". In
> ideographic based language, there isnt a concept of a "word".
>
> For example, Chinese, Japanese and Korean are actually "phonetics
> language", and that ideograph characters are used to express the
> phonetics. A "word" or more accurately "morphemes" can be express in a
> single or more ideographs. A single latin character is unlikely to be
> useful by itself (except of a and i) but thats not the case in CJK.
>
> If the condition is that "no single ASCII character", I may be neutral
> about it (since a single ideograph would never translate to a single
> ASCII character in the zonefile, due to the xn-- prefix) but if the
> "character" is defined more broadly to cover "U-label" character, then
> I would have strong objections.

At the moment, the condition is "no single Unicode code point." To  
the extent that a single CJK ideograph can be expressed using a  
single Unicode code point, this would represent the situation to  
which you say you would object. I will dig through my notes to find  
out why the "single character" condition was adopted -

- Lyman


More information about the Idna-update mailing list